R Programming in Data Science: High Volume Data

R Programming in Data Science: High Volume Data
R Programming in Data Science: High Volume Data
English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 1h 25m | 224 MB

Data fills all available space, and now that storage is cheap, the amount of data has exploded. However, all that information is useless without analysis and context. The R programming language is designed to make it easier to analyze and visualize massive amounts of data. For example, R provides the ability to multiply one block of variables by another—an assumption that provides inherent advantages over other languages. This course shows why R is ideal for high volumes of data, introduces more efficient ways to use the language, and explains how to avoid the problems and capitalize on the opportunities of big data. Learn how to determine if you have enough memory and processing power, produce visualizations of big data, optimize your R code, and use advanced techniques such as parallel processing to speed up your computations. Plus, discover how to integrate R with big-data solutions such as SQL databases and Apache Spark.

Topics include:

  • Accessing memory and processing power
  • Visualizing high-volume data
  • Profiling and optimizing R code
  • Compiling R functions
  • Parallel processing with R
  • Using R with other big data solutions
Table of Contents

1 Wrangling high-volume data with R
2 Sample data set
3 Perspectives on high-volume data
4 Big data and available memory
5 Code Finding available memory
6 Big data and CPU cycles
7 Code How fast is your computer
8 High-volume data and visualizations
9 Code Graphs for high-volume data
10 Code rug() and jitter()
11 Code Applying statistics to plots
12 Code Subsampled graphs for high-volume data
13 Code Trellising data across multiple charts
14 R programming tools for high-volume data
15 Downsampling
16 Profile R code to find inefficiencies
17 Code Profile R code to find inefficiencies
18 Avoid the copy-on-modify problem with R
19 Code Avoid copy-on-modify with data.table
20 Optimization versus readability
21 Compile R functions
22 Parallel processing with R
23 Code Parallel R functions
24 bigmemory, LaF, and ff packages
25 Store high-volume data in a database
26 Code R with databases
27 Cloud computing with R
28 Sparklyr with R
29 Code R with Sparklyr
30 Summary of high-volume data with R