Device-Generated Commands

This chapter discusses the generation of command buffer content on the device, for which these principle steps are to be taken:

vkCmdPreprocessGeneratedCommandsNV executes in a separate logical pipeline from either graphics or compute. When preprocessing commands in a separate step they must be explicitly synchronized against the command execution. When not preprocessing, the preprocessing is automatically synchronized against the command execution.

Indirect Commands Layout

VkIndirectCommandsLayoutNVOpaque handle to an indirect commands layout object

Creation and Deletion

vkCreateIndirectCommandsLayoutNVCreate an indirect command layout object
VkIndirectCommandsLayoutCreateInfoNVStructure specifying the parameters of a newly created indirect commands layout object
VkIndirectCommandsLayoutUsageFlagBitsNVBitmask specifying allowed usage of an indirect commands layout
VkIndirectCommandsLayoutUsageFlagsNVBitmask of VkIndirectCommandsLayoutUsageFlagBitsNV
vkDestroyIndirectCommandsLayoutNVDestroy an indirect commands layout

Token Input Streams

VkIndirectCommandsStreamNVStructure specifying input streams for generated command tokens

The input streams can contain raw uint32_t values, existing indirect commands such as:

or additional commands as listed below. How the data is used is described in the next section.

VkBindShaderGroupIndirectCommandNVStructure specifying input data for a single shader group command token
VkBindIndexBufferIndirectCommandNVStructure specifying input data for a single index buffer command token
VkBindVertexBufferIndirectCommandNVStructure specifying input data for a single vertex buffer command token
VkSetStateFlagsIndirectCommandNVStructure specifying input data for a single state flag command token
VkIndirectStateFlagBitsNVBitmask specifying state that can be altered on the device
VkIndirectStateFlagsNVBitmask of VkIndirectStateFlagBitsNV
VkBindPipelineIndirectCommandNVStructure specifying input data for the compute pipeline dispatch token

Tokenized Command Processing

The processing is in principle illustrated below:

void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
  for (t = 0; t < indirectCommandsLayout.tokenCount; t++)
  {
    uint32_t stream  = indirectCommandsLayout.pTokens[t].stream;
    uint32_t offset  = indirectCommandsLayout.pTokens[t].offset;
    uint32_t stride  = indirectCommandsLayout.pStreamStrides[stream];
    stream            = pIndirectCommandsStreams[stream];
    const void* input = stream.buffer.pointer( stream.offset + stride * s + offset )

    // further details later
    indirectCommandsLayout.pTokens[t].command (cmd, pipeline, input, s);
  }
}

void cmdProcessAllSequences(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, sequencesCount)
{
  for (s = 0; s < sequencesCount; s++)
  {
    cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s);
  }
}

The processing of each sequence is considered stateless, therefore all state changes must occur before any action command tokens within the sequence. A single sequence is strictly targeting the VkPipelineBindPoint it was created with.

The primary input data for each token is provided through VkBuffer content at preprocessing using vkCmdPreprocessGeneratedCommandsNV or execution time using vkCmdExecuteGeneratedCommandsNV, however some functional arguments, for example binding sets, are specified at layout creation time. The input size is different for each token.

VkIndirectCommandsTokenTypeNVEnum specifying token commands
VkIndirectCommandsLayoutTokenNVStruct specifying the details of an indirect command layout token

The following code provides detailed information on how an individual sequence is processed. For valid usage, all restrictions from the regular commands apply.

void cmdProcessSequence(cmd, pipeline, indirectCommandsLayout, pIndirectCommandsStreams, s)
{
  for (uint32_t t = 0; t < indirectCommandsLayout.tokenCount; t++){
    token = indirectCommandsLayout.pTokens[t];

    uint32_t stride   = indirectCommandsLayout.pStreamStrides[token.stream];
    stream            = pIndirectCommandsStreams[token.stream];
    uint32_t offset   = stream.offset + stride * s + token.offset;
    const void* input = stream.buffer.pointer( offset )

    switch(input.type){
    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_SHADER_GROUP_NV:
      VkBindShaderGroupIndirectCommandNV* bind = input;

      vkCmdBindPipelineShaderGroupNV(cmd, indirectCommandsLayout.pipelineBindPoint,
        pipeline, bind->groupIndex);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_STATE_FLAGS_NV:
      VkSetStateFlagsIndirectCommandNV* state = input;

      if (token.indirectStateFlags & VK_INDIRECT_STATE_FLAG_FRONTFACE_BIT_NV){
        if (state.data & (1 << 0)){
          set VK_FRONT_FACE_CLOCKWISE;
        } else {
          set VK_FRONT_FACE_COUNTER_CLOCKWISE;
        }
      }
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV:
      uint32_t* data = input;

      vkCmdPushConstants(cmd,
        token.pushconstantPipelineLayout
        token.pushconstantStageFlags,
        token.pushconstantOffset,
        token.pushconstantSize, data);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_INDEX_BUFFER_NV:
      VkBindIndexBufferIndirectCommandNV* data = input;

      // the indexType may optionally be remapped
      // from a custom uint32_t value, via
      // VkIndirectCommandsLayoutTokenNV::pIndexTypeValues

      vkCmdBindIndexBuffer(cmd,
        deriveBuffer(data->bufferAddress),
        deriveOffset(data->bufferAddress),
        data->indexType);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_VERTEX_BUFFER_NV:
      VkBindVertexBufferIndirectCommandNV* data = input;

      // if token.vertexDynamicStride is VK_TRUE
      // then the stride for this binding is set
      // using data->stride as well

      vkCmdBindVertexBuffers(cmd,
        token.vertexBindingUnit, 1,
        &deriveBuffer(data->bufferAddress),
        &deriveOffset(data->bufferAddress));
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_INDEXED_NV:
      vkCmdDrawIndexedIndirect(cmd,
        stream.buffer, offset, 1, 0);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_NV:
      vkCmdDrawIndirect(cmd,
        stream.buffer,
        offset, 1, 0);
    break;

    // only available if VK_NV_mesh_shader is supported
    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_TASKS_NV:
      vkCmdDrawMeshTasksIndirectNV(cmd,
        stream.buffer, offset, 1, 0);
    break;

    // only available if VK_EXT_mesh_shader is supported
    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DRAW_MESH_TASKS_NV:
      vkCmdDrawMeshTasksIndirectEXT(cmd,
        stream.buffer, offset, 1, 0);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_PIPELINE_NV:
      VkBindPipelineIndirectCommandNV *data = input;
      VkPipeline computePipeline = deriveFromDeviceAddress(data->pipelineAddress);
      vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);
    break;

    case VK_INDIRECT_COMMANDS_TOKEN_TYPE_DISPATCH_NV:
      vkCmdDispatchIndirect(cmd, stream.buffer, offset);
    break;
    }
  }
}

Indirect Commands Generation and Execution

vkGetGeneratedCommandsMemoryRequirementsNVRetrieve the buffer allocation requirements for generated commands
VkGeneratedCommandsMemoryRequirementsInfoNVStructure specifying parameters for the reservation of preprocess buffer space

To bind a compute pipeline in Device-Generated Commands, an application must query the pipeline’s device address.

vkGetPipelineIndirectDeviceAddressNVGet pipeline’s 64-bit device address
VkPipelineIndirectDeviceAddressInfoNVStructure specifying the pipeline to query an address for
vkGetPipelineIndirectMemoryRequirementsNVGet the memory requirements for the compute indirect pipeline
vkCmdExecuteGeneratedCommandsNVGenerate and execute commands on the device
VkGeneratedCommandsInfoNVStructure specifying parameters for the generation of commands

Referencing the functions defined in Indirect Commands Layout, vkCmdExecuteGeneratedCommandsNV behaves as:

uint32_t sequencesCount = sequencesCountBuffer ?
      min(maxSequencesCount, sequencesCountBuffer.load_uint32(sequencesCountOffset) :
      maxSequencesCount;


cmdProcessAllSequences(commandBuffer, pipeline,
                       indirectCommandsLayout, pIndirectCommandsStreams,
                       sequencesCount,
                       sequencesIndexBuffer, sequencesIndexOffset);

// The stateful commands within indirectCommandsLayout will not
// affect the state of subsequent commands in the target
// command buffer (cmd)

It is important to note that the values of all state related to the pipelineBindPoint used are undefined: after this command.

vkCmdPreprocessGeneratedCommandsNVPerforms preprocessing for generated commands

The bound descriptor sets and push constants that will be used with indirect command generation for the compute pipelines must already be specified at the time of preprocessing commands with vkCmdPreprocessGeneratedCommandsNV. They must not change until the execution of indirect commands is submitted with vkCmdExecuteGeneratedCommandsNV.

If push constants for the compute pipeline are also specified in the VkGeneratedCommandsInfoNV::indirectCommandsLayout with VK_INDIRECT_COMMANDS_TOKEN_TYPE_PUSH_CONSTANT_NV token, then those values override the push constants that were previously pushed for the compute pipeline.