Apache Spark Course
Spark Architecture in 6 minutes
#apachespark #spark #sparkarchitecture
π Application
π Driver
π Executor
π Partition
π Job
π Stage
π Tasks
π Slots
π Lazy evaluation
π Narrow and wide Transformations
π Actions
PySpark Introduction in 7 minutes
#pyspark #apachespark
π Data Frame Creation
π Eager evaluation
π View the data frames and manipulations
π SQL with pyspark
π Group By
π GitHub: https://github.com/ambarishg/pyspark
Spark SQL Case Study with Used Car Dataset [ 1.4 GB ] in 10 minutes
β Schema Enforcement
β COUNT
β GROUP BY Single variable, Multiple variables
β Average, Standard Deviation, Maximum, Minimum
β ROLLUP and CUBE
β Dense Rank without and with Partition
π GitHub: https://github.com/ambarishg/pyspark/tree/master/02-SparkSQL
Lakehouse Architecture in 9 minutes
#apachespark #spark #sparkarchitecture
βοΈEvolution of data warehouses to Lakehouse
βοΈSQL Performance Techniques in Lakehouse
βοΈ Brief about Delta lake
Delta Tables Deep Dive in 14 minutes [ Case Study approach - Cricket Test Match ]
#apachespark #spark #deltatable
βοΈ Case Study approach - Cricket Test Match
βοΈ Create Delta Table
βοΈUPSERT in Delta Table [ INSERT and UPDATE in a single command ]
βοΈ Delta log and versions
βοΈ Time Travel