Apache Spark
Parallel processing engine
RDD
dataframe
parquet
delta lake table format, extends parquet with a transaction log and metadata
enables relational db benefits on batch & stream
structured streaming
Parallel processing engine
RDD
dataframe
parquet
delta lake table format, extends parquet with a transaction log and metadata
enables relational db benefits on batch & stream
structured streaming