Tuesday, 4 March 2014

Set Up Visual Studio 2012 for CUDA Dynamic Parallelism

In this post i am going to show you how to set up Visual Studio 2012 to compile CUDA code which uses dynamic parallelism.

What is Dynamic Parallelism?


Dynamic parallelism enables the GPU to create new work for itself without ever having to involve the CPU by allowing CUDA kernels to call other CUDA kernels.

Nvidia 2013
This is a powerful tool for CUDA developers as there is now no longer the need to pass data between the host and device as launch configuration decisions can now be made at runtime by threads executing on the device.

In order to make use of dynamic parallelism you must have an Nvidia GPU with compute capability 3.5

Dynamic Parallelism Example


As with every first code example we will look at a simple 'hello world' program which uses dynamic parallelism. 


Here we have two kernels, the parent kernel and the child kernel. Out parent kernel calls the child, waits for it to complete its work and then carries out some work of its own. 

When we initially try and build this code using visual studio we will get the following error. 


Visual studio is complaining that we can only call kernels within kernels if we have a card of compute 3.5 or higher. To let visual studio know that we do infact have a suitable card we need to change a few properties. 

To do this we open up the project properties by right clicking on the project. 



The first setting we need to change is to tell CUDA to generate relocatable device code by selecting 'Yes (-rdc=true)'


Next we have to tell CUDA to use compute 3.5 by changing the code generation to 'compute_35,sm_35'


We then need to tell the runtime library to use the multi threaded library. If we are in debug mode we select 'Multi-Threaded Debug (/MTd)' or if we are in release mode 'Multi-Threaded (/MT)'


Finally inside the linker properties we add an additional dependency to 'cudadevrt.lib' 

After following these steps our project will now build successfully and we should get the following output. 


This has been to mostly help remind myself how to set up visual studio in the future if i forget but hopefully it can help out a few other people as well. 

Any questions or if you are struggling with anything feel free to send me an email at craigmcmillan01@outlook.com with blog in the subject and i will be happy to help. 

3 comments:

  1. hola, no se encuentran los pdb ni las dll, como las agrego? gracias

    ReplyDelete
  2. Hi, nice tutorial :) do you know how to set the properties for dynamic parallelism with CMake?

    ReplyDelete
  3. Hi, I started to learn dynamic parallelism using the simple "Hello Wrold" example, which is mentioned in this post. I made changes in all the settings that are mentioned. But, still I am getting the same error, like, Error calling a __global__ function("childKernel") from a __global__ function("parentKernel") is only allowed on the compute_35 architecture or above.

    System details along with software versions used:
    NVIDIA GeForce GTX 1050
    Windows 7 (64 bit)
    Microsoft Visual studio 2015
    CUDA 9.1 version

    ReplyDelete