Device-Generated Commands
This chapter discusses the generation of command buffer content on the device, for which these principle steps are to be taken:
- Define a layout describing the sequence of commands which should be generated.
- Optionally set up device-bindable shaders.
- Retrieve device addresses by vkGetBufferDeviceAddressEXT for setting buffers on the device.
- Fill one or more
VkBuffer
with the appropriate content that gets interpreted by the command layout. - Create a
preprocess
VkBuffer
using the device-queried allocation information. - Optionally preprocess the input data in a separate action.
- Generate and execute the actual commands.
The preprocessing step executes in a separate logical pipeline from either graphics or compute. When preprocessing commands in a separate step they must be explicitly synchronized against the command execution. When not preprocessing in a separate step, the preprocessing is automatically synchronized against the command execution.
Indirect Commands Layout
The device-side command generation happens through an iterative processing of an atomic sequence comprised of command tokens, which are represented by:
or:
Each indirect command layout must have exactly one action command token and it must be the last token in the sequence.
If the indirect commands layout contains only 1 token, it will be an action command token, and the contents of the indirect buffer will be a sequence of indirect command structures, similar to the ones used for indirect draws and dispatches. On some implementations, using indirect draws and dispatches for these cases will result in increased performance compared to using device-generated commands, due to the overhead that results from using the latter.
Creation and Deletion
Token Input Streams
For VK_EXT_device_generated_commands, the input streams can
contain raw uint32_t
values, existing indirect commands such as:
- VkDrawIndirectCommand
- VkDrawIndexedIndirectCommand
- VkDispatchIndirectCommand
- VkDrawMeshTasksIndirectCommandNV
- VkDrawMeshTasksIndirectCommandEXT
- VkTraceRaysIndirectCommandKHR
- VkTraceRaysIndirectCommand2KHR
or additional commands as listed below. How the data is used is described in the next section.
For VK_NV_device_generated_commands, the input streams can contain
raw uint32_t
values, existing indirect commands such as:
- VkDrawIndirectCommand
- VkDrawIndexedIndirectCommand
- VkDrawMeshTasksIndirectCommandNV
- VkDrawMeshTasksIndirectCommandEXT
- VkDispatchIndirectCommand
or additional commands as listed below. How the data is used is described in the next section.
Tokenized Command Processing
The processing for VK_EXT_device_generated_commands is in principle illustrated below:
void cmdProcessSequence(cmd, indirectExecutionSet, indirectCommandsLayout, indirectAddress, s)
{
for (t = 0; t < indirectCommandsLayout.tokenCount; t++)
{
uint32_t offset = indirectCommandsLayout.pTokens[t].offset;
uint32_t stride = indirectCommandsLayout.indirectStride;
VkDeviceAddress streamData = indirectAddress;
const void* input = streamData + stride * s + offset;
// further details later
indirectCommandsLayout.pTokens[t].command (cmd, indirectExecutionSet, input, s);
}
}
void cmdProcessAllSequences(cmd, indirectExecutionSet, indirectCommandsLayout, indirectAddress, sequencesCount)
{
for (s = 0; s < sequencesCount; s++)
{
sUsed = s;
if (indirectCommandsLayout.flags & VK_INDIRECT_COMMANDS_LAYOUT_USAGE_UNORDERED_SEQUENCES_BIT_EXT) {
sUsed = incoherent_implementation_dependent_permutation[ sUsed ];
}
cmdProcessSequence( cmd, indirectExecutionSet, indirectCommandsLayout, indirectAddress, sUsed );
}
}
The processing of each sequence is considered stateless, therefore all state changes must occur prior to action commands within the sequence. A single sequence is strictly targeting the VkShaderStageFlags it was created with.
The primary input data for each token is provided through VkBuffer
content at preprocessing using vkCmdPreprocessGeneratedCommandsEXT or
execution time using vkCmdExecuteGeneratedCommandsEXT, however some
functional arguments, for example push constant layouts, are specified at
layout creation time.
The input size is different for each token.
void cmdProcessSequence(cmd, indirectExecutionSet, indirectCommandsLayout, indirectAddress, s)
{
for (uint32_t t = 0; t < indirectCommandsLayout.tokenCount; t++) {
VkIndirectCommandsLayoutTokenEXT *token = &indirectCommandsLayout.pTokens[t];
uint32_t offset = token->offset;
uint32_t stride = indirectCommandsLayout.indirectStride;
VkDeviceAddress streamData = indirectAddress;
const void* input = streamData + stride * s + offset;
switch (token->tokenType) {
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_EXECUTION_SET_EXT:
uint32_t *bind = input;
VkIndirectCommandsExecutionSetTokenEXT *info = token->data.pExecutionSet;
if (info->type == VK_INDIRECT_EXECUTION_SET_INFO_TYPE_PIPELINES_EXT) {
vkCmdBindPipeline(cmd, indirectExecutionSet.pipelineBindPoint, indirectExecutionSet.pipelines[*bind]);
} else {
VkShaderStageFlagBits stages[];
VkShaderEXT shaders[];
uint32_t i = 0;
IterateBitmaskLSBToMSB(iter, info->shaderStages) {
stages[i] = iter;
shaders[i] = indirectExecutionSet.shaders[bind[i]].shaderObject;
i++;
}
vkCmdBindShadersEXT(cmd, i, stages, shaders);
}
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_EXT:
uint32_t* data = input;
VkPushConstantsInfoKHR info = {
VK_STRUCTURE_TYPE_PUSH_CONSTANTS_INFO_KHR,
// this can also use `dynamicGeneratedPipelineLayout' to pass a VkPipelineLayoutCreateInfo from pNext
indirectCommandsLayout.pipelineLayout,
token->token.pushConstant.updateRange.shaderStages,
token->token.pushConstant.updateRange.offset,
token->token.pushConstant.updateRange.size,
data
};
vkCmdPushConstants2KHR(cmd, &info);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_SEQUENCE_INDEX_EXT:
VkPushConstantsInfoKHR info = {
VK_STRUCTURE_TYPE_PUSH_CONSTANTS_INFO_KHR,
// this can also use `dynamicGeneratedPipelineLayout' to pass a VkPipelineLayoutCreateInfo from pNext
indirectCommandsLayout.pipelineLayout,
token->token.pushConstant.updateRange.shaderStages,
token->token.pushConstant.updateRange.offset,
// this must be 4
token->token.pushConstant.updateRange.size,
// this just updates the sequence index
&s
};
vkCmdPushConstants2KHR(cmd, &info);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_INDEX_BUFFER_EXT:
VkBindIndexBufferIndirectCommandEXT* data = input;
vkCmdBindIndexBuffer(cmd, deriveBuffer(data->bufferAddress), deriveOffset(data->bufferAddress), data->indexType);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_VERTEX_BUFFER_EXT:
VkBindVertexBufferIndirectCommandEXT* data = input;
vkCmdBindVertexBuffers2(cmd, token->token.vertexBuffer->vertexBindingUnit, 1, &deriveBuffer(data->bufferAddress),
&deriveOffset(data->bufferAddress), data->size, data->stride);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_INDEXED_EXT:
VkDrawIndexedIndirectCommand *data = input;
vkCmdDrawIndexed(cmd, data->indexCount, data->instanceCount, data->firstIndex, data->vertexOffset, data->firstInstance);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_INDEXED_COUNT_EXT:
VkDrawIndirectCountIndirectCommandEXT* data = input;
vkCmdDrawIndexedIndirect(cmd, deriveBuffer(data->bufferAddress), deriveoffset(data->bufferAddress), min(data->commandCount, indirectCommandsLayout.maxDrawCount), data->stride);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_EXT:
VkDrawIndirectCommand* data = input;
vkCmdDraw(cmd, data->vertex_count, data->instanceCount, data->firstVertex, data->firstIndex);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_COUNT_EXT:
VkDrawIndirectCountIndirectCommandEXT* data = input;
vkCmdDrawIndirect(cmd, deriveBuffer(data->bufferAddress), deriveoffset(data->bufferAddress), min(data->commandCount, indirectCommandsLayout.maxDrawCount), data->stride);
break;
// only available if VK_NV_mesh_shader is enabled
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_NV_EXT:
VkDrawMeshTasksIndirectCommandNV *data = input;
vkCmdDrawMeshTasksNV(cmd, data->taskCount, data->firstTask);
break;
// only available if VK_NV_mesh_shader is enabled
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_COUNT_NV_EXT:
VkDrawIndirectCountIndirectCommandEXT* data = input;
vkCmdDrawMeshTasksIndirectCountNV(cmd, deriveBuffer(data->bufferAddress), deriveoffset(data->bufferAddress), min(data->commandCount, indirectCommandsLayout.maxDrawCount), data->stride);
break;
// only available if VK_EXT_mesh_shader is enabled
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_EXT:
VkDrawMeshTasksIndirectCommandEXT *data = input;
vkCmdDrawMeshTasksEXT(cmd, data->groupCountX, data->groupCountY, data->groupCountZ);
break;
// only available if VK_EXT_mesh_shader is enabled
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_COUNT_EXT:
VkDrawIndirectCountIndirectCommandEXT* data = input;
vkCmdDrawMeshTasksIndirectCountEXT(cmd, deriveBuffer(data->bufferAddress), deriveoffset(data->bufferAddress), min(data->commandCount, indirectCommandsLayout.maxDrawCount), data->stride);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DISPATCH_EXT:
VkDispatchIndirectCommand *data = input;
vkCmdDispatch(cmd, data->x, data->y, data->z);
break;
// only available if VK_KHR_ray_tracing_maintenance1 is enabled
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_TRACE_RAYS2_EXT:
vkCmdTraceRaysIndirect2KHR(cmd, deriveBuffer(input));
break;
}
}
}
The processing for VK_NV_device_generated_commands is in principle illustrated below:
void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
for (t = 0; t < indirectCommandsLayout.tokenCount; t++)
{
uint32_t stream = indirectCommandsLayout.pTokens[t].stream;
uint32_t offset = indirectCommandsLayout.pTokens[t].offset;
uint32_t stride = indirectCommandsLayout.pStreamStrides[stream];
stream = pIndirectCommandsStreams[stream];
const void* input = stream.buffer.pointer( stream.offset + stride * s + offset )
// further details later
indirectCommandsLayout.pTokens[t].command (cmd, pipeline, input, s);
}
}
void cmdProcessAllSequences(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, sequencesCount)
{
for (s = 0; s < sequencesCount; s++)
{
cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s);
}
}
The processing of each sequence is considered stateless, therefore all state changes must occur before any action command tokens within the sequence. A single sequence is strictly targeting the VkPipelineBindPoint it was created with.
The primary input data for each token is provided through VkBuffer
content at preprocessing using vkCmdPreprocessGeneratedCommandsNV or
execution time using vkCmdExecuteGeneratedCommandsNV, however some
functional arguments, for example binding sets, are specified at layout
creation time.
The input size is different for each token.
The following code provides detailed information on how an individual sequence is processed. For valid usage, all restrictions from the regular commands apply.
void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
for (uint32_t t = 0; t < indirectCommandsLayout.tokenCount; t++){
token = indirectCommandsLayout.pTokens[t];
uint32_t stride = indirectCommandsLayout.pStreamStrides[token.stream];
stream = pIndirectCommandsStreams[token.stream];
uint32_t offset = stream.offset + stride * s + token.offset;
const void* input = stream.buffer.pointer( offset )
switch(input.type){
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_SHADER_GROUP_NV:
VkBindShaderGroupIndirectCommandNV* bind = input;
vkCmdBindPipelineShaderGroupNV(cmd, indirectCommandsLayout.pipelineBindPoint,
pipeline, bind->groupIndex);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_STATE_FLAGS_NV:
VkSetStateFlagsIndirectCommandNV* state = input;
if (token.indirectStateFlags & VK_INDIRECT_STATE_FLAG_FRONTFACE_BIT_NV){
if (state.data & (1 << 0)){
set VK_FRONT_FACE_CLOCKWISE;
} else {
set VK_FRONT_FACE_COUNTER_CLOCKWISE;
}
}
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV:
uint32_t* data = input;
vkCmdPushConstants(cmd,
token.pushconstantPipelineLayout
token.pushconstantStageFlags,
token.pushconstantOffset,
token.pushconstantSize, data);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_INDEX_BUFFER_NV:
VkBindIndexBufferIndirectCommandNV* data = input;
// the indexType may optionally be remapped
// from a custom uint32_t value, via
// VkIndirectCommandsLayoutTokenNV::pIndexTypeValues
vkCmdBindIndexBuffer(cmd,
deriveBuffer(data->bufferAddress),
deriveOffset(data->bufferAddress),
data->indexType);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_VERTEX_BUFFER_NV:
VkBindVertexBufferIndirectCommandNV* data = input;
// if token.vertexDynamicStride is VK_TRUE
// then the stride for this binding is set
// using data->stride as well
vkCmdBindVertexBuffers(cmd,
token.vertexBindingUnit, 1,
&deriveBuffer(data->bufferAddress),
&deriveOffset(data->bufferAddress));
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_INDEXED_NV:
vkCmdDrawIndexedIndirect(cmd,
stream.buffer, offset, 1, 0);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_NV:
vkCmdDrawIndirect(cmd,
stream.buffer,
offset, 1, 0);
break;
// only available if VK_NV_mesh_shader is supported
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_TASKS_NV:
vkCmdDrawMeshTasksIndirectNV(cmd,
stream.buffer, offset, 1, 0);
break;
// only available if VK_EXT_mesh_shader is supported
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_NV:
vkCmdDrawMeshTasksIndirectEXT(cmd,
stream.buffer, offset, 1, 0);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PIPELINE_NV:
VkBindPipelineIndirectCommandNV *data = input;
VkPipeline computePipeline = deriveFromDeviceAddress(data->pipelineAddress);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DISPATCH_NV:
vkCmdDispatchIndirect(cmd, stream.buffer, offset);
break;
}
}
}
Indirect Commands Generation and Execution
The generation of commands on the device requires a preprocess
buffer.
With VK_NV_device_generated_commands, to bind a compute pipeline in Device-Generated Commands, an application must query the pipeline’s device address.
Indirect Execution Sets
It is legal to update an Indirect Execution Set that is in flight as long as
the element indices in pExecutionSetWrites
are not in use.
Any change to an indirect execution set requires recalculating memory
requirements by calling vkGetGeneratedCommandsMemoryRequirementsEXT
for commands that use that modified state.
Commands that are in flight or those not using updated elements require no
changes.
The lifetimes of pipelines and shader objects contained in a set must match or exceed the lifetime of the set.
Referencing the functions defined in Indirect Commands Layout,
vkCmdExecuteGeneratedCommandsNV
behaves as:
uint32_t sequencesCount = sequencesCountBuffer ?
min(maxSequencesCount, sequencesCountBuffer.load_uint32(sequencesCountOffset) :
maxSequencesCount;
cmdProcessAllSequences(commandBuffer, pipeline,
indirectCommandsLayout, pIndirectCommandsStreams,
sequencesCount,
sequencesIndexBuffer, sequencesIndexOffset);
// The stateful commands within indirectCommandsLayout will not
// affect the state of subsequent commands in the target
// command buffer (cmd)
It is important to note that the values of all state related to the
pipelineBindPoint
used are undefined: after this command.
The bound descriptor sets and push constants that will be used with indirect command generation for the compute pipelines must already be specified at the time of preprocessing commands with vkCmdPreprocessGeneratedCommandsNV. They must not change until the execution of indirect commands is submitted with vkCmdExecuteGeneratedCommandsNV.
If push constants for the compute pipeline are also specified in the
VkGeneratedCommandsInfoNV::indirectCommandsLayout
with
VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV
token, then those
values override the push constants that were previously pushed for the
compute pipeline.
Referencing the functions defined in Indirect Commands Layout,
vkCmdExecuteGeneratedCommandsEXT
behaves as:
uint32_t sequencesCount = sequenceCountAddress ?
min(maxSequenceCount, sequenceCountAddress.load_uint32()) :
maxSequenceCount;
cmdProcessAllSequences(commandBuffer, indirectExecutionSet,
indirectCommandsLayout, indirectAddress,
sequencesCount);
// The stateful commands within indirectCommandsLayout will not
// affect the state of subsequent commands in the target
// command buffer (cmd)
It is important to note that the affected values of all state related to the
shaderStages
used are undefined: after this command.
This means that e.g., if this command indirectly alters push constants, the
push constant state becomes undefined:.
The bound descriptor sets and push constants that will be used with indirect
command generation must already be specified on stateCommandBuffer
at
the time of preprocessing commands with
vkCmdPreprocessGeneratedCommandsEXT.
They must match the bound descriptor sets and push constants used in the
execution of indirect commands with vkCmdExecuteGeneratedCommandsEXT.
If push constants for shader stages are also specified in the
VkGeneratedCommandsInfoEXT::indirectCommandsLayout
with a
VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_EXT
or
VK_INDIRECT_COMMANDS_TOKEN_TYPE_SEQUENCE_INDEX_EXT
token, then those
values override the push constants that were previously pushed.
All state bound on stateCommandBuffer
will be used.
All state bound on stateCommandBuffer
must be identical to the state
bound at the time vkCmdExecuteGeneratedCommandsEXT is recorded.
The queue family index stateCommandBuffer
was allocated from must be
the same as the queue family index of the command buffer used in
vkCmdExecuteGeneratedCommandsEXT.
On some implementations, preprocessing may have no effect on performance.
vkCmdExecuteGeneratedCommandsEXT may write to the preprocess buffer, no matter the isPreprocess parameter. In this case, the implementation must insert appropriate synchronization automatically, which corresponds to the following pseudocode:
- Barrier
- srcStageMask = DRAW_INDIRECT
- srcAccesMask = 0
- dstStageMask = COMMAND_PREPROCESS_BIT
- dstAccessMask = COMMAND_PREPROCESS_WRITE_BIT | COMMAND_PREPROCESS_READ_BIT
- Do internal writes
- Barrier
- srcStageMask = COMMAND_PREPROCESS_BIT
- srcAccesMask = COMMAND_PREPROCESS_WRITE_BIT
- dstStageMask = DRAW_INDIRECT
- dstAccessMask = INDIRECT_COMMAND_READ_BIT
- Execute