GPU Programing

结论

What is the main difference between a CPU and a GPU?

CPU is a latency-oriented device designed to solve complex problems quickly using very few powerful processors with sophisticated control logic. GPU is a throughput-oriented device with many weak processors designed to work on large datasets in parallel by dividing them into multiple chunks.

Why do applications run faster on a GPU, and is that always true?

GPU can exploit data parallelism very well. This is the main reason behind the speed-up. However, the data transfer between RAM and VRAM is slow. So, if the overall problem has a small dataset, the time taken to transfer data will be more than the speed-up from parallelization.

How can I code a simple program and run it on a GPU?

CUDA C provides a convenient framework to write functions that can run directly on the GPU.

What is the architecture of a modern GPU?

******A modern GPU has three main components: Streaming Processors (CUDA cores), Memory, and Control. CUDA cores are grouped into multiple Streaming Processors, and memory is divided into registers, shared memory, and global memory.

What are the different memory types in a GPU?

There are five types of memory in a CUDA device: ******Global Memory, Local Memory, Constant Memory, Shared Memory, and Registers.

How do CUDA blocks/threads work with different GPU hardware components?

When a kernel is launched, all threads in a block are simultaneously assigned to the same SM. Once a block is assigned to an SM, it is divided into 32-thread units called warps. There are usually more threads assigned to an SM than its cores. This is done so that GPUs can tolerate long-latency operations (like global memory accesses).

What are the common coding practices to get the best out of a GPU?

******SIMD execution on a warp suggests that for optimum results, all threads in a warp must follow the same execution path or control flow, i.e., there should not be any control divergence **of threads.

Is there a way to determine the resources available on a GPU?

Several commands (predefined CUDA functions) can be used to determine the available resources for a GPU.

Basic

Untitled