Device-Generated Commands
This chapter discusses the generation of command buffer content on the device, for which these principle steps are to be taken:
- Define via
VkIndirectCommandsLayoutNV
the sequence of commands which should be generated. - Optionally make use of device-bindable Shader Groups for graphics pipelines.
- Retrieve device addresses by vkGetBufferDeviceAddressEXT for setting buffers on the device.
- Retrieve device addresses of compute pipelines by vkGetPipelineIndirectDeviceAddressNV to be able to bind it in device generated rendering.
- Fill one or more
VkBuffer
with the appropriate content that gets interpreted byVkIndirectCommandsLayoutNV
. - Create a
preprocess
VkBuffer
using the allocation information from vkGetGeneratedCommandsMemoryRequirementsNV. - Optionally preprocess the input data using vkCmdPreprocessGeneratedCommandsNV in a separate action.
- Generate and execute the actual commands via vkCmdExecuteGeneratedCommandsNV passing all required data.
vkCmdPreprocessGeneratedCommandsNV executes in a separate logical pipeline from either graphics or compute. When preprocessing commands in a separate step they must be explicitly synchronized against the command execution. When not preprocessing, the preprocessing is automatically synchronized against the command execution.
Indirect Commands Layout
Creation and Deletion
Token Input Streams
The input streams can contain raw uint32_t
values, existing indirect
commands such as:
- VkDrawIndirectCommand
- VkDrawIndexedIndirectCommand
- VkDrawMeshTasksIndirectCommandNV
- VkDrawMeshTasksIndirectCommandEXT
- VkDispatchIndirectCommand
or additional commands as listed below. How the data is used is described in the next section.
Tokenized Command Processing
The processing is in principle illustrated below:
void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
for (t = 0; t < indirectCommandsLayout.tokenCount; t++)
{
uint32_t stream = indirectCommandsLayout.pTokens[t].stream;
uint32_t offset = indirectCommandsLayout.pTokens[t].offset;
uint32_t stride = indirectCommandsLayout.pStreamStrides[stream];
stream = pIndirectCommandsStreams[stream];
const void* input = stream.buffer.pointer( stream.offset + stride * s + offset )
// further details later
indirectCommandsLayout.pTokens[t].command (cmd, pipeline, input, s);
}
}
void cmdProcessAllSequences(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, sequencesCount)
{
for (s = 0; s < sequencesCount; s++)
{
cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s);
}
}
The processing of each sequence is considered stateless, therefore all state changes must occur before any action command tokens within the sequence. A single sequence is strictly targeting the VkPipelineBindPoint it was created with.
The primary input data for each token is provided through VkBuffer
content at preprocessing using vkCmdPreprocessGeneratedCommandsNV or
execution time using vkCmdExecuteGeneratedCommandsNV, however some
functional arguments, for example binding sets, are specified at layout
creation time.
The input size is different for each token.
The following code provides detailed information on how an individual sequence is processed. For valid usage, all restrictions from the regular commands apply.
void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
for (uint32_t t = 0; t < indirectCommandsLayout.tokenCount; t++){
token = indirectCommandsLayout.pTokens[t];
uint32_t stride = indirectCommandsLayout.pStreamStrides[token.stream];
stream = pIndirectCommandsStreams[token.stream];
uint32_t offset = stream.offset + stride * s + token.offset;
const void* input = stream.buffer.pointer( offset )
switch(input.type){
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_SHADER_GROUP_NV:
VkBindShaderGroupIndirectCommandNV* bind = input;
vkCmdBindPipelineShaderGroupNV(cmd, indirectCommandsLayout.pipelineBindPoint,
pipeline, bind->groupIndex);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_STATE_FLAGS_NV:
VkSetStateFlagsIndirectCommandNV* state = input;
if (token.indirectStateFlags & VK_INDIRECT_STATE_FLAG_FRONTFACE_BIT_NV){
if (state.data & (1 << 0)){
set VK_FRONT_FACE_CLOCKWISE;
} else {
set VK_FRONT_FACE_COUNTER_CLOCKWISE;
}
}
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV:
uint32_t* data = input;
vkCmdPushConstants(cmd,
token.pushconstantPipelineLayout
token.pushconstantStageFlags,
token.pushconstantOffset,
token.pushconstantSize, data);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_INDEX_BUFFER_NV:
VkBindIndexBufferIndirectCommandNV* data = input;
// the indexType may optionally be remapped
// from a custom uint32_t value, via
// VkIndirectCommandsLayoutTokenNV::pIndexTypeValues
vkCmdBindIndexBuffer(cmd,
deriveBuffer(data->bufferAddress),
deriveOffset(data->bufferAddress),
data->indexType);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_VERTEX_BUFFER_NV:
VkBindVertexBufferIndirectCommandNV* data = input;
// if token.vertexDynamicStride is VK_TRUE
// then the stride for this binding is set
// using data->stride as well
vkCmdBindVertexBuffers(cmd,
token.vertexBindingUnit, 1,
&deriveBuffer(data->bufferAddress),
&deriveOffset(data->bufferAddress));
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_INDEXED_NV:
vkCmdDrawIndexedIndirect(cmd,
stream.buffer, offset, 1, 0);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_NV:
vkCmdDrawIndirect(cmd,
stream.buffer,
offset, 1, 0);
break;
// only available if VK_NV_mesh_shader is supported
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_TASKS_NV:
vkCmdDrawMeshTasksIndirectNV(cmd,
stream.buffer, offset, 1, 0);
break;
// only available if VK_EXT_mesh_shader is supported
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_NV:
vkCmdDrawMeshTasksIndirectEXT(cmd,
stream.buffer, offset, 1, 0);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PIPELINE_NV:
VkBindPipelineIndirectCommandNV *data = input;
VkPipeline computePipeline = deriveFromDeviceAddress(data->pipelineAddress);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);
break;
case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DISPATCH_NV:
vkCmdDispatchIndirect(cmd, stream.buffer, offset);
break;
}
}
}
Indirect Commands Generation and Execution
To bind a compute pipeline in Device-Generated Commands, an application must query the pipeline’s device address.
Referencing the functions defined in Indirect Commands Layout,
vkCmdExecuteGeneratedCommandsNV
behaves as:
uint32_t sequencesCount = sequencesCountBuffer ?
min(maxSequencesCount, sequencesCountBuffer.load_uint32(sequencesCountOffset) :
maxSequencesCount;
cmdProcessAllSequences(commandBuffer, pipeline,
indirectCommandsLayout, pIndirectCommandsStreams,
sequencesCount,
sequencesIndexBuffer, sequencesIndexOffset);
// The stateful commands within indirectCommandsLayout will not
// affect the state of subsequent commands in the target
// command buffer (cmd)
It is important to note that the values of all state related to the
pipelineBindPoint
used are undefined: after this command.
The bound descriptor sets and push constants that will be used with indirect command generation for the compute pipelines must already be specified at the time of preprocessing commands with vkCmdPreprocessGeneratedCommandsNV. They must not change until the execution of indirect commands is submitted with vkCmdExecuteGeneratedCommandsNV.
If push constants for the compute pipeline are also specified in the
VkGeneratedCommandsInfoNV::indirectCommandsLayout
with
VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV
token, then those
values override the push constants that were previously pushed for the
compute pipeline.