VK_NV_cuda_kernel_launch
Other Extension Metadata
Last Modified Date
2020-09-30
Contributors
- Eric Werness, NVIDIA
Description
Interoperability between APIs can sometimes create additional overhead depending on the platform used. This extension targets deployment of existing CUDA kernels via Vulkan, with a way to directly upload PTX kernels and dispatch the kernels from Vulkan’s command buffer without the need to use interoperability between the Vulkan and CUDA contexts. However, we do encourage actual development using the native CUDA runtime for the purpose of debugging and profiling.
The application will first have to create a CUDA module using vkCreateCudaModuleNV then create the CUDA function entry point with vkCreateCudaFunctionNV.
Then in order to dispatch this function, the application will create a command buffer where it will launch the kernel with vkCmdCudaLaunchKernelNV.
When done, the application will then destroy the function handle, as well as the CUDA module handle with vkDestroyCudaFunctionNV and vkDestroyCudaModuleNV.
To reduce the impact of compilation time, this extension offers the
capability to return a binary cache from the PTX that was provided.
For this, a first query for the required cache size is made with
vkGetCudaModuleCacheNV with a NULL
pointer to a buffer and with a
valid pointer receiving the size; then another call of the same function
with a valid pointer to a buffer to retrieve the data.
The resulting cache could then be used later for further runs of this
application by sending this cache instead of the PTX code (using the same
vkCreateCudaModuleNV), thus significantly speeding up the
initialization of the CUDA module.
As with VkPipelineCache, the binary cache depends on the hardware architecture. The application must assume the cache might fail, and need to handle falling back to the original PTX code as necessary. Most often, the cache will succeed if the same GPU driver and architecture is used between the cache generation from PTX and the use of this cache. In the event of a new driver version, or if using a different GPU architecture, the cache is likely to become invalid.
New Object Types
New Commands
- vkCmdCudaLaunchKernelNV
- vkCreateCudaFunctionNV
- vkCreateCudaModuleNV
- vkDestroyCudaFunctionNV
- vkDestroyCudaModuleNV
- vkGetCudaModuleCacheNV
New Structures
- VkCudaFunctionCreateInfoNV
- VkCudaLaunchInfoNV
- VkCudaModuleCreateInfoNV
- Extending VkPhysicalDeviceFeatures2, VkDeviceCreateInfo:
- Extending VkPhysicalDeviceProperties2:
New Enum Constants
VK_NV_CUDA_KERNEL_LAUNCH_EXTENSION_NAME
VK_NV_CUDA_KERNEL_LAUNCH_SPEC_VERSION
- Extending VkObjectType:
VK_OBJECT_TYPE_CUDA_FUNCTION_NV
VK_OBJECT_TYPE_CUDA_MODULE_NV
- Extending VkStructureType:
VK_STRUCTURE_TYPE_CUDA_FUNCTION_CREATE_INFO_NV
VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NV
VK_STRUCTURE_TYPE_CUDA_MODULE_CREATE_INFO_NV
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_FEATURES_NV
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CUDA_KERNEL_LAUNCH_PROPERTIES_NV
If VK_EXT_debug_report is supported:
- Extending VkDebugReportObjectTypeEXT:
VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_FUNCTION_NV_EXT
VK_DEBUG_REPORT_OBJECT_TYPE_CUDA_MODULE_NV_EXT
Issues
None.
Version History
- Revision 1, 2020-03-01 (Tristan Lorach)
- Revision 2, 2020-09-30 (Tristan Lorach)