Get in Touch

Course Outline

Introduction

  • What is ROCm?
  • What is HIP?
  • Comparison of ROCm, CUDA, and OpenCL
  • Overview of ROCm and HIP features and architecture
  • Differences between ROCm for Windows and ROCm for Linux

Installation

  • Installing ROCm on Windows
  • Verifying installation and checking device compatibility
  • Updating or uninstalling ROCm on Windows
  • Troubleshooting common installation issues

Getting Started

  • Creating a new ROCm project using Visual Studio Code on Windows
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying output using printf and fprintf

ROCm API

  • Utilizing the ROCm API in host programs
  • Querying device information and capabilities
  • Allocating and deallocating device memory
  • Copying data between host and device
  • Launching kernels and synchronizing threads
  • Handling errors and exceptions

HIP Language

  • Using the HIP language in device programs
  • Writing kernels that execute on the GPU and manipulate data
  • Utilizing data types, qualifiers, operators, and expressions
  • Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

  • Utilizing different memory spaces: global, shared, constant, and local
  • Managing different memory objects: pointers, arrays, textures, and surfaces
  • Employing various memory access modes: read-only, write-only, read-write, etc.
  • Understanding memory consistency models and synchronization mechanisms

ROCm and HIP Execution Model

  • Utilizing different execution structures: threads, blocks, and grids
  • Using thread functions such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Employing block functions like __syncthreads and __threadfence_block
  • Utilizing grid functions such as hipGridDim_x, hipGridSync, and cooperative groups

Debugging

  • Debugging ROCm and HIP programs on Windows
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stacks, etc.
  • Utilizing ROCm Debugger for debugging ROCm and HIP programs on AMD devices
  • Analyzing ROCm and HIP programs on AMD devices using ROCm Profiler

Optimization

  • Optimizing ROCm and HIP programs on Windows
  • Applying coalescing techniques to enhance memory throughput
  • Using caching and prefetching techniques to reduce memory latency
  • Leveraging shared and local memory techniques to optimize memory access and bandwidth
  • Utilizing profiling and profiling tools to measure and improve execution time and resource utilization

Summary and Next Steps

Requirements

  • A solid understanding of the C/C++ language and parallel programming concepts.
  • Foundational knowledge of computer architecture and memory hierarchy.
  • Practical experience with command-line tools and code editors.
  • Familiarity with the Windows operating system and PowerShell.

Audience

  • Developers seeking to learn how to install and use ROCm on Windows to program AMD GPUs and harness their parallelism.
  • Developers aiming to write high-performance, scalable code capable of running across various AMD devices.
  • Programmers interested in exploring the low-level aspects of GPU programming and optimizing code performance.
 21 Hours

Number of participants


Price per participant

Provisional Upcoming Courses (Require 5+ participants)

Related Categories