Big Data Analytics with Apache Spark and Python

Big Data Analytics with Apache Spark and Python
Big Data Analytics with Apache Spark and Python
English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 9.5 Hours | 1.89 GB

Apache Spark programming with Python, Spark SQL, Spark Streaming, Machine Learning and Real time Data Science

“Apache Spark is the hottest Big Data technology today. Its adoption is growing fast and so is the demand for professionals trained in it”.

Apache Spark is the most active Apache project, and it is pushing back Map Reduce. It is fast, general purpose and supports multiple programming languages, data sources and management systems. More and more organizations are adapting Apache Spark to build big data solutions through batch, interactive and stream processing paradigms. The demand for trained professionals in Spark is going through the roof. Being a new technology, there aren’t enough training sources to provide easy guidance on building end-to-end solutions.

Big Data Analytics with Apache Spark and Python addresses the problem. It explains the concepts and capabilities of Spark in a simple and easy way. It then looks at various stages of analytics and how Spark can be used to build end-to-end solutions that run on parallel clusters. It also shows how Spark can be used for real time Data Science projects. It uses a windows based installation for sample code and exercises, so its easy to set-up and execute Spark than relying on Linux VMs.

Through this course, we strive to make you fully equipped to become a developer who can execute full fledged Analytics projects with Spark. By taking this course, you will

  • Understand the concepts and capabilities of distributed computing in Spark.
  • Learn about Data Engineering with Spark
  • Use new capabilities like Spark SQL and Streaming
  • Master the application of Analytics and Machine Learning techniques
  • Build real time Data Science applications
  • Do all exercises on your windows laptop/desktop without the need for VMs
Table of Contents

Introduction
01 About the course
02 About V2 Maestros
03 Resource Bundle

Overview
04 Hadoop Overview
05 HDFS Architecture
06 Map Reduce – How it works
07 Map Reduce – Example
08 Hadoop Stack
09 What is Spark
10 Spark Architecture – Part 1
11 Spark Architecture – Part 2
12 Installing Spark and Setting up for Python

Programming with Spark
13 Spark Transformations
14 Spark Actions
15 Advanced Spark Programming
16 Python – Spark Programming examples 1
17 Python – Spark Programming Examples 2

Spark SQL
19 Spark SQL Overview
20 Python – Spark SQL Examples

Spark Streaming
22 Streaming with Apache Spark
23 Python – Spark Streaming examples

Real time Data Science
24 Basic Elements of Data Science
25 The Dataset
26 Learning from relationships
27 Modeling and Prediction
28 Data Science Use Cases
29 Types of Analytics
30 Types of Learning
31 Doing Data Science in real time with Spark

Machine Learning with Spark
32 Spark Machine Learning
33 Analyzing Results and Errors
34 Linear Regression
35 Spark Use Case Linear Regression
36 Decision Trees
37 Spark Use Case Decision Trees Classification
38 Principal Component Analysis
39 Random Forests Classification
40 Python Use Case Random Forests PCA
41 Text Preprocessing with TF-IDF
42 Naive Bayes Classification
43 Spark Use Case Naive Bayes TF-IDF
44 K-Means Clustering
45 Spark Use Case K-Means
46 Recommendation Engines
47 Spark Use Case Collaborative Filtering
48 Real Time Twitter Data Sentiment Analysis

Conclusion
51 Closing Remarks