Thoughts - Ambarish

Apache Spark Course

Spark Architecture in 6 minutes

#apachespark #spark #sparkarchitecture
πŸ““ Application
πŸ““ Driver
πŸ““ Executor
πŸ““ Partition
πŸ““ Job
πŸ““ Stage
πŸ““ Tasks
πŸ““ Slots
πŸ““ Lazy evaluation
πŸ““ Narrow and wide Transformations
πŸ““ Actions

PySpark Introduction in 7 minutes

#pyspark #apachespark

πŸ“— Data Frame Creation
πŸ“— Eager evaluation
πŸ“— View the data frames and manipulations
πŸ“— SQL with pyspark
πŸ“— Group By

πŸ”‹ GitHub: https://github.com/ambarishg/pyspark

Spark SQL Case Study with Used Car Dataset [ 1.4 GB ] in 10 minutes

⭐ Schema Enforcement
⭐ COUNT
⭐ GROUP BY Single variable, Multiple variables
⭐ Average, Standard Deviation, Maximum, Minimum
⭐ ROLLUP and CUBE
⭐ Dense Rank without and with Partition

πŸ”‹ GitHub: https://github.com/ambarishg/pyspark/tree/master/02-SparkSQL

Lakehouse Architecture in 9 minutes

#apachespark #spark #sparkarchitecture

βœ”οΈEvolution of data warehouses to Lakehouse
βœ”οΈSQL Performance Techniques in Lakehouse
βœ”οΈ Brief about Delta lake

Delta Tables Deep Dive in 14 minutes [ Case Study approach - Cricket Test Match ]

#apachespark #spark #deltatable

βœ”οΈ Case Study approach - Cricket Test Match
βœ”οΈ Create Delta Table
βœ”οΈUPSERT in Delta Table [ INSERT and UPDATE in a single command ]
βœ”οΈ Delta log and versions
βœ”οΈ Time Travel