Data structures and algorithms that are great for traditional software may quickly slow or fail altogether when applied to huge datasets. Algorithms and Data Structures for Massive Datasets introduces a toolbox of new techniques that are perfect for handling modern big data applications. You’ll discover methods for reducing and sketching data so it fits in small memory without losing accuracy, and unlock the algorithms and data structures that form the backbone of a big data system. Filled with fun illustrations and examples from real-world businesses, you’ll learn how each of these complex techniques can be practically applied to maximize the accuracy and throughput of big data processing and analytics.
Modern data-intensive applications are outpacing traditional data structures and algorithms. Huge data sets rapidly grow beyond available memory, becoming slow and inefficient, and bottlenecking development. Fortunately, you don’t need to blow your budget on expensive upgrades to your computing power! Algorithms and Data Structures for Massive Datasets lays out ways to sketch data in main memory and organize data on disk to make the best use of your available resources. Taken from the latest research papers, these effective techniques apply to any discipline, from finance to text analysis.
Algorithms and Data Structures for Massive Datasets teaches you to take advantage of data processing and analytics techniques specifically designed for large distributed datasets. And you’ll be amazed how easy it is to learn such a challenging topic from this friendly guide! Complex concepts are illustrated with interesting, entertaining graphics and fascinating industry stories that show how these techniques have succeeded in the real world. You’ll study examples including Google BigTable, BitCoin, and a smart bed sensor app, learning to build data sketches for processing, querying and exploring large datasets. By the time you’re done, you’ll be able to identify the perfect algorithm to deliver faster and more reliable results for any data intensive system.
- Sketching data structures for practical problems
- Choosing the right database engine for your application
- Evaluating and designing efficient on-disk data structures and algorithms
- Understanding the algorithmic tradeoffs involved in massive-scale systems
- Deriving basic statistics from streaming data
- Correctly sampling streaming data
- Computing percentiles with limited space resources