CUDA dynamic parallelism -- Is there a way to infinitely nest kernel launches?

Now, I'm using CUDA dynamic parallelism to create the kernel in a kernel function. In the CUDA document, kernel functions can only be launched a fixed recursion depth because of resource constraints.
But in my project, I want to launch the kernel but parent kernel doesn't need to wait for exiting of child kernel. In other words, they are completely independent.
So is there some ways to launch the kernel in kernel functions but not limited by recursion depth?

I use the cudaDeviceSetLimit() to set cudaLimitDevRuntimeSyncDepth but there are still restrictions.

example:

__global__ void do_something(MyQueue* queue, Task* task) {
  // do something ...
  task->execute();

  // If queue is not empty, pop from it and launch a kernel to execute it
  Task* t = queue->pop();
  if (t) {
    do_something<<<t->gridSize, t->blockSize, t->mem, stream>>>(queue, t);
  }
}

GPU: NVIDIA A100 CUDA version: 12.4

compile flag: -rdc=true -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61

asked 52 mins ago

Frostmourne

133 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

CUDA dynamic parallelism -- Is there a way to infinitely nest kernel launches?

0

Your Answer

Browse other questions tagged
cuda
dynamic-parallelism
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged cudadynamic-parallelism or ask your own question.

Browse other questions tagged
cuda
dynamic-parallelism
or ask your own question.