Learning CUDA 10 Programming

Learning CUDA 10 Programming
Learning CUDA 10 Programming
English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 2h 27m | 489 MB

Harness the power of GPUs to speed up your applications

Do you want to write GPU-accelerated applications, but don’t know how to get started? With CUDA 10, you can easily add GPU processing to your C and C++ projects. CUDA 10 is the de-facto framework used to develop high-performance, GPU-accelerated applications.

In this course, you will be introduced to CUDA programming through hands-on examples. CUDA provides a general-purpose programming model which gives you access to the tremendous computational power of modern GPUs, as well as powerful libraries for machine learning, image processing, linear algebra, and parallel algorithms.

After working through this course, you will understand the fundamentals of CUDA programming and be able to start using it in your applications right away.


  • Use CUDA to speed up your applications using machine learning, image processing, linear algebra, and more
  • Learn to debug CUDA programs and handle errors
  • Use optimization techniques to get the maximum performance from your CUDA programs
  • Master the fundamentals of concurrency and parallel algorithms on GPUs
  • Learn about the wide range of GPU-accelerated libraries included with CUDA
  • Learn the next steps you can take to continue building your CUDA skills
Table of Contents

Introduction to CUDA
1 The Course Overview
2 Overview of CUDA
3 Installing the CUDA Toolkit on Windows
4 Installing the CUDA Toolkit on Linux
5 Your First CUDA Program

Programming with CUDA
6 The CUDA Programming Model
7 Kernel Execution Configurations
8 Debugging with NVIDIA Nsight on Windows
9 Debugging with cuda-gdb on Linux
10 Handling Errors

Performance Optimizations
11 The NVIDIA Visual Profiler
12 Using Memory Efficiently
13 Working with 2D and 3D Memory Layouts
14 Texture and Constant Memory
15 Instruction and Control Flow Optimizations

Parallel Algorithms
16 Introduction to Shared Memory
17 Reduction
18 Prefix Sum
19 Filtering

GPU Accelerated Libraries
20 Deep Learning
21 Signal, Image, and Video
22 Linear Algebra and Math
23 Parallel Algorithms

Advanced CUDA Topics
24 Concurrency and Streams
25 Overlapping Transfers and Computation
26 Device Management
27 Programming with Multiple GPUs
28 The Unified Address Space
29 Dynamic Global Memory Allocation
30 Dynamic Parallelism

Summary and Next Steps
31 What We Have Learned
32 Next Steps
33 Resources to Explore