Apache Spark
Parallel processing engine
RDD
dataframe
parquet
Delta lake table format
Extends parquet with a transaction log and metadata
Enables relational DB benefits on batch & stream
Structured streaming
Parallel processing engine
RDD
dataframe
parquet
Extends parquet with a transaction log and metadata
Enables relational DB benefits on batch & stream
Structured streaming