Leverage machine and deep learning models to build applications on real-time data using PySpark. This book is perfect for those who want to learn to use this language to perform exploratory data analysis and solve an array of business challenges.
You'll start by reviewing PySpark fundamentals, such as Spark’s core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms.
You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github.
Learn PySpark
Chapter 1: Introduction to PySpark
Chapter 2: Data Processing
Chapter 3: Spark Structured Streaming
Chapter 4: Airflow
Chapter 5: Machine Learning Library (MLlib)
Chapter 6: Supervised Machine Learning
Chapter 7: Unsupervised Machine Learning
Chapter 8: Deep Learning Using PySpark
Covers entire range of PySpark’s offerings from streaming to graph analytics Build standardized work flows for pre-processing and builds machine learning and deep learning models on big data sets Discusses how to schedule different Spark jobs using Airflow
Pramod Singh
PySpark Python Machine Learning Deep Learning Big Data Spark Data Processing AirFlow Supervised Machine Learning Unsupervised Machine Learning Graph Frames