VK_NV_external_compute_queue.proposal
Problem Statement
Vulkan applications launching compute workloads through external APIs are presently limited by the fact that execution of these workloads is performed within a separate execution context on the GPU. This limits the ability of applications to achieve maximum performance when executing these externally launched compute workloads as these external compute workloads will not be able to execute simultaneously to the workloads launched through the Vulkan API.
This extension proposes a mechanism through which an application may join an external compute API to a Vulkan device in order for the application to execute these external compute workloads within the same context as Vulkan workloads.
Solution Space
There are two main approaches we could take to solve the underlying problem:
- Join the Vulkan device and queues to the context created by the external compute API
- Join the external compute API to the execution context represented by the Vulkan device and queues
In this extension we take the latter approach, with the introduction of new Vulkan API object representing external queues which an application can create. These external queues hold information and reserve resources necessary for the external API to join the Vulkan device.
Proposal
API Changes
VkExternalComputeQueueNV
Creating and Destroying External Compute Queues
In order to create and destroy external compute queues an application first must start by passing an additional structure to the vkCreateDevice
function through the VkDeviceCreateInfo
pNext chain:
typedef struct VkExternalComputeQueueDeviceCreateInfoNV {
VkStructureType sType;
void * pNext;
uint32_t reservedExternalQueues;
} VkExternalComputeQueueDeviceCreateInfoNV;
Here, the application must use the reservedExternalQueues
field to communicate the maximum number of simultaneous external compute queues it may create. While external compute queues may be created and destroyed dynamically, having the application communicate this limit at device creation time may allow the implementation to make better resource allocation decisions during device initialization.
Once a Vulkan device has been created, external compute queues can be created through a new API entry point:
VKAPI_ATTR VkResult VKAPI_CALL vkCreateExternalComputeQueueNV(
VkDevice device,
const VkExternalComputeQueueCreateInfoNV * pCreateInfo,
const VkAllocationCallbacks * pAllocator,
VkExternalComputeQueueNV * pExternalQueue
)
Where the VkExternalComputeQueueCreateInfoNV
structure is defined as follows:
typedef struct VkExternalComputeQueueCreateInfoNV {
VkStructureType sType;
void * pNext;
VkQueue preferredQueue;
} VkExternalComputeQueueCreateInfoNV;
When creating a VkExternalComputeQueueNV
, the preferredQueue
field in the VkExternalComputeQueueCreateInfoNV
structure is a strong scheduling hint as to which Vulkan queue graphics workloads will be submitted to with the expectation that execution will overlap with execution of work submitted through the external API.
Once created, the VkExternalComputeQueueNV
holds on to information and resources necessary for an external compute API to be able to execute within the same context as Vulkan workloads. In order to destroy the external queue and release these resources, an application must call:
VKAPI_ATTR VkResult VKAPI_CALL vkDestroyExternalComputeQueueNV(
VkDevice device,
VkExternalComputeQueueNV externalQueue,
const VkAllocationCallbacks * pAllocator
)
Using External Compute Queues
In order to utilize a VkExternalComputeQueueNV
with a compatible external compute API, an application must query for an opaque blob of information using the vkGetExternalComputeQueueDataNV
entry point:
VKAPI_ATTR VkResult VKAPI_CALL vkGetExternalComputeQueueDataNV(
VkExternalComputeQueueNV externalQueue,
VkExternalComputeQueueDataParamsNV * params,
void * pData
)
The VkExternalComputeQueueDataParamsNV
structure is defined as follows:
typedef struct VkExternalComputeQueueDataParamsNV {
VkStructureType sType;
void * pNext;
uint32_t deviceIndex;
} VkExternalComputeQueueDataParamsNV;
Where deviceIndex
is the index of the device within the device group for which data is being queried.
The information needed by the external compute API is stored by vkGetExternalComputeQueueDataNV
in the pData
argument, which is a pointer to an application managed memory allocation. The required size for this allocation can be queried through a new VkPhysicalDeviceProperties
structure: VkPhysicalDeviceExternalComputeQueuePropertiesNV
.
Properties
typedef struct VkPhysicalDeviceExternalComputeQueuePropertiesNV {
VkStructureType sType;
void * pNext;
uint32_t externalDataSize;
uint32_t maxExternalQueues;
} VkPhysicalDeviceExternalComputeQueuePropertiesNV;
This structure defines two properties of interest for working with this extension:
externalDataSize
- this field indicates the size of the memory allocation that is expected to be passed to thevkGetExternalComputeQueueDataNV
function.maxExternalQueues
- this field indicates the maximum number of external compute queues which can be created through this extension on a single Vulkan device.
Issues
RESOLVED: How does execution of work on an external compute queue interact with vkDeviceWaitIdle
?
vkDeviceWaitIdle
does not wait for external compute queues. Draining work on an external compute queue must be done through its own external API.
RESOLVED: Does this extension allow for more direct sharing of resources between Vulkan and the external compute API?
No. This extension does not propose any changes or additions to existing resource sharing and cross-API synchronization mechanisms.