Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark

Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark
Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark
English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 5h 40m | 1.30 MB

Complete course on Sqoop, Flume, and Hive: Great for CCA175 and Hortonworks Spark Certification preparation

In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Then, you’ll be introduced to Sqoop Import, through which will gain knowledge of the lifecycle of the Sqoop command and how to use the import command to migrate data from Mysql to HDFS, and from Mysql to Hive–and much more.

In addition, you will learn about Sqoop Export to migrate data effectively, and about Apache Flume to ingest data. The section Apache Hive introduces Hive, alongside external and managed tables; working with different files, and Parquet and Avro—and more. You will learn about Spark Dataframes, Spark SQL and lot more in the last sections.


  • Hadoop Distributed File System (HDFS) and commands
  • Lifecycle of Sqoop command
  • Sqoop import command to migrate data from Mysql to HDFS
  • Sqoop import command to migrate data from Mysql to Hive
  • Understand split-by and boundary queries
  • Use incremental mode to migrate the data from Mysql to HDFS
  • Using Sqoop export to migrate data from HDFS to MySQL
  • Spark Data frames – working with diff File Formats & Compression
  • Spark SQL
Table of Contents

Hadoop Introduction
1 HDFS and Hadoop Commands

Sqoop Import
2 Sqoop Introduction
3 Managing Target Directories
4 Working with Different File Formats
5 Working with Different Compressions
6 Conditional Imports
7 Split-by and Boundary Queries
8 Field delimeters
9 Incremental Appends
10 Sqoop Hive Import
11 Sqoop List Tables Database
12 Sqoop Import Practice1
13 Sqoop Import Practice2
14 Sqoop Import Practice3

Sqoop Export
15 Export from Hdfs to Mysql
16 Export from Hive to Mysql

Apache Flume
17 Flume Introduction & Architecture
18 Exec Source and Logger Sink
19 Moving data from Twitter to HDFS
20 Moving data from NetCat to HDFS
21 Flume Interceptors
22 Flume Interceptor Example
23 Flume Multi-Agent Flow
24 Flume Consolidation

Apache Hive
25 Hive Introduction
26 Hive Database
27 Hive Managed Tables
28 Hive External Tables
29 Hive Inserts
30 Hive Analytics
31 Working with Parquet
32 Compressing Parquet
33 Working with Fixed File Format
34 Alter Command
35 Hive String Functions
36 Hive Date Functions
37 Hive Partitioning
38 Hive Bucketing

Spark Introduction
39 Spark Introduction
40 Resilient Distributed Datasets
41 Cluster Overview
42 Directed Acyclic Graph (DAG) & Stages

Spark Transformations & Actions
43 Map FlatMap Transformation
44 Filter Intersection
45 Union Distinct Transformation
46 GroupByKey Group people based on Birthday months
47 ReduceByKey Total Number of students in each Subject
48 SortByKey Sort students based on their rollno
49 MapPartition MapPartitionWithIndex
50 Change number of Partitions
51 Join Join email address based on customer name
52 Spark Actions

Spark RDD Practice
53 Scala Tuples
54 Extract Error Logs from log files
55 Frequency of word in Text File
56 Population of each City
57 Orders placed by Customers
58 Movie Average Rating greater than 3

Spark Dataframes & Spark SQL
59 Dataframe Intro
60 Dafaframe from Json Files
61 Dataframe from Parquet Files
62 Dataframe from CSV Files
63 Dataframe from Avro XML Files
64 Working with Different Compressions
65 DataFrame API Part1
66 DataFrame API Part2
67 Spark SQL
68 Working with Hive Tables in Spark