Taming Big Data with Apache Spark and Python - Hands On!, Dive right in with 15+ hands-on examples of analyzing large data sets with Apache Spark, on your desktop or on Hadoop!
Created by Sundog Education by Frank Kane, Frank Kane
Preview This Course - GET COUPON CODE
What Will I Learn?
- Use DataFrames and Structured Streaming in Spark 2
- Frame big data analysis problems as Spark problems
- Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN
- Install and run Apache Spark on a desktop computer or on a cluster
- Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's
- Implement iterative algorithms such as breadth-first-search using Spark
- Use the MLLib machine learning library to answer common data mining questions
- Understand how Spark SQL lets you work with structured data
- Understand how Spark Streaming lets your process continuous streams of data in real time
- Tune and troubleshoot large jobs running on a cluster
- Share information between nodes on a Spark cluster using broadcast variables and accumulators
- Understand how the GraphX library helps with network analysis problems