Ray Tracing

Ray tracing uses a separate rendering pipeline from both the graphics and compute pipelines (see Ray Tracing Pipeline).

Within the ray tracing pipeline, a pipeline trace ray instruction can be called to perform a ray traversal that invokes the various ray tracing shader stages during its execution. The relationship between the ray tracing pipeline object and the geometries present in the acceleration structure traversed is passed into the ray tracing command in a VkBuffer object known as a shader binding table. OpExecuteCallableKHR can also be used in ray tracing pipelines to invoke a callable shader.

During execution, control alternates between scheduling and other operations. The scheduling functionality is implementation-specific and is responsible for workload execution. The shader stages are programmable. Traversal, which refers to the process of traversing acceleration structures to find potential intersections of rays with geometry, is fixed function.

The programmable portions of the pipeline are exposed in a single-ray programming model, with each invocation handling one ray at a time. Memory operations can be synchronized using standard memory barriers. The Workgroup scope and variables with a storage class of Workgroup must not be used in the ray tracing pipeline.

Shader Call Instructions

A shader call is an instruction which may cause execution to continue elsewhere by creating one or more invocations that execute a different shader stage.

The shader call instructions are:

  • OpTraceRayKHR which may invoke intersection, any-hit, closest hit, or miss shaders,
  • OpTraceRayMotionNV which may invoke intersection, any-hit, closest hit, or miss shaders,
  • OpReportIntersectionKHR which may invoke any-hit shaders, and
  • OpExecuteCallableKHR which will invoke a callable shader.
  • OpHitObjectTraceRayNV, OpHitObjectTraceRayMotionNV, and OpHitObjectExecuteShaderNV which may invoke intersection, any-hit, closest hit, miss, or callable shaders.

Pipeline trace ray instructions can be used recursively; invoked shaders can themselves execute pipeline trace ray instructions, to a maximum depth defined by the maxRecursionDepth or maxRayRecursionDepth limit.

Shaders directly invoked from the API always have a recursion depth of 0; each shader executed by a pipeline trace ray instruction has a recursion depth one higher than the recursion depth of the shader which invoked it. Applications must not invoke a shader with a recursion depth greater than the value of maxRecursionDepth or maxPipelineRayRecursionDepth specified in the pipeline.

There is no explicit recursion limit for other shader call instructions which may recurse (e.g. OpExecuteCallableKHR) but there is an upper bound determined by the stack size.

An invocation repack instruction is a ray tracing instruction where the implementation may change the set of invocations that are executing. When a repack instruction is encountered, the invocation is suspended and a new invocation begins and executes the instruction. After executing the repack instruction (which may result in other ray tracing shader stages executing) the new invocation ends and the original invocation is resumed, but it may be resumed in a different subgroup or at a different SubgroupLocalInvocationId within the same subgroup. When a subset of invocations in a subgroup execute the invocation repack instruction, those that do not execute it remain in the same subgroup at the same SubgroupLocalInvocationId.

The OpTraceRayKHR, OpTraceRayMotionNV, OpReorderThreadWithHintNV, OpReorderThreadWithHitObjectNV, OpReportIntersectionKHR, and OpExecuteCallableKHR instructions are invocation repack instructions.

When a ray tracing shader executes a dynamic instance of an invocation repack instruction which results in another ray tracing shader being invoked, their instructions are related by shader-call-order.

For ray tracing invocations that are shader-call-related:

  • memory operations on StorageBuffer, Image, and ShaderRecordBufferKHR storage classes can be synchronized using the ShaderCallKHR scope.
  • the CallableDataKHR, IncomingCallableDataKHR, RayPayloadKHR, HitAttributeKHR, and IncomingRayPayloadKHR storage classes are system-synchronized and no application availability and visibility operations are required.
  • memory operations within a single invocation before and after the shader call instruction are ordered by program-order and do not require explicit synchronization.

Ray Tracing Commands

Ray tracing commands provoke work in the ray tracing pipeline. Ray tracing commands are recorded into a command buffer and when executed by a queue will produce work that executes according to the currently bound ray tracing pipeline. A ray tracing pipeline must be bound to a command buffer before any ray tracing commands are recorded in that command buffer.

vkCmdTraceRaysNVInitialize a ray tracing dispatch
vkCmdTraceRaysKHRInitialize a ray tracing dispatch
vkCmdBindInvocationMaskHUAWEIBind an invocation mask image on a command buffer
vkCmdTraceRaysIndirectKHRInitialize an indirect ray tracing dispatch
VkTraceRaysIndirectCommandKHRStructure specifying the parameters of an indirect ray tracing command
vkCmdTraceRaysIndirect2KHRInitialize an indirect ray tracing dispatch with indirect shader binding tables
VkTraceRaysIndirectCommand2KHRStructure specifying the parameters of an indirect trace ray command with indirect shader binding tables

Shader Binding Table

A shader binding table is a resource which establishes the relationship between the ray tracing pipeline and the acceleration structures that were built for the ray tracing pipeline. It indicates the shaders that operate on each geometry in an acceleration structure. In addition, it contains the resources accessed by each shader, including indices of textures, buffer device addresses, and constants. The application allocates and manages shader binding tables as VkBuffer objects.

Each entry in the shader binding table consists of shaderGroupHandleSize bytes of data, either as queried by vkGetRayTracingShaderGroupHandlesKHR to refer to those specified shaders, or all zeros to refer to a zero shader group. A zero shader group behaves as though it is a shader group consisting entirely of VK_SHADER_UNUSED_KHR. The remainder of the data specified by the stride is application-visible data that can be referenced by a ShaderRecordBufferKHR block in the shader.

The shader binding tables to use in a ray tracing pipeline are passed to the vkCmdTraceRaysNV, vkCmdTraceRaysKHR, or vkCmdTraceRaysIndirectKHR commands. Shader binding tables are read-only in shaders that are executing on the ray tracing pipeline.

Shader variables identified with the ShaderRecordBufferKHR storage class are used to access the provided shader binding table. Such variables must be:

  • typed as OpTypeStruct, or an array of this type,
  • identified with a Block decoration, and
  • laid out explicitly using the Offset, ArrayStride, and MatrixStride decorations as specified in Offset and Stride Assignment.

The Offset decoration for any member of a Block-decorated variable in the ShaderRecordBufferKHR storage class must not cause the space required for that variable to extend outside the range [0, maxStorageBufferRange).

Accesses to the shader binding table from ray tracing pipelines must be synchronized with the VK_PIPELINE_STAGE_RAY_TRACING_SHADER_BIT_KHR

pipeline stage and an access type of VK_ACCESS_SHADER_READ_BIT.

Because different shader record buffers can be associated with the same shader, a shader variable with ShaderRecordBufferKHR storage class will not be dynamically uniform if different invocations of the same shader can reference different data in the shader record buffer, such as if the same shader occurs twice in the shader binding table with a different shader record buffer. In this case, indexing resources based on values in the ShaderRecordBufferKHR storage class, the index should be decorated as NonUniform.

Indexing Rules

In order to execute the correct shaders and access the correct resources during a ray tracing dispatch, the implementation must be able to locate shader binding table entries at various stages of execution. This is accomplished by defining a set of indexing rules that compute shader binding table record positions relative to the buffer’s base address in memory. The application must organize the contents of the shader binding table’s memory in a way that application of the indexing rules will lead to correct records.

Ray Generation Shaders

Only one ray generation shader is executed per ray tracing dispatch.

For vkCmdTraceRaysKHR, the location of the ray generation shader is specified by the pRaygenShaderBindingTable→deviceAddress parameter — there is no indexing. All data accessed must be less than pRaygenShaderBindingTable→size bytes from deviceAddress. pRaygenShaderBindingTable→stride is unused, and must be equal to pRaygenShaderBindingTable→size.

For vkCmdTraceRaysNV, the location of the ray generation shader is specified by the raygenShaderBindingTableBuffer and raygenShaderBindingOffset parameters — there is no indexing.

Hit Shaders

The base for the computation of intersection, any-hit, and closest hit shader locations is the instanceShaderBindingTableRecordOffset value stored with each instance of a top-level acceleration structure (VkAccelerationStructureInstanceKHR). This value determines the beginning of the shader binding table records for a given instance.

In the following rule, geometryIndex refers to the geometry index of the intersected geometry within the instance.

The sbtRecordOffset and sbtRecordStride values are passed in as parameters to traceNV() or traceRayEXT() calls made in the shaders. See Section 8.19 (Ray Tracing Functions) of the OpenGL Shading Language Specification for more details. In SPIR-V, these correspond to the SBTOffset and SBTStride parameters to the pipeline trace ray instructions.

The result of this computation is then added to pHitShaderBindingTable→deviceAddress, a device address passed to vkCmdTraceRaysKHR , or hitShaderBindingOffset, a base offset passed to vkCmdTraceRaysNV .

For vkCmdTraceRaysKHR, the complete rule to compute a hit shader binding table record address in the pHitShaderBindingTable is:

pHitShaderBindingTable→deviceAddress + pHitShaderBindingTable→stride × ( instanceShaderBindingTableRecordOffset + geometryIndex × sbtRecordStride + sbtRecordOffset )

All data accessed must be less than pHitShaderBindingTable→size bytes from the base address.

For vkCmdTraceRaysNV, the offset and stride come from direct parameters, so the full rule to compute a hit shader binding table record address in the hitShaderBindingTableBuffer is:

hitShaderBindingOffset + hitShaderBindingStride × ( instanceShaderBindingTableRecordOffset + geometryIndex × sbtRecordStride + sbtRecordOffset )

Miss Shaders

A miss shader is executed whenever a ray query fails to find an intersection for the given scene geometry. Multiple miss shaders may be executed throughout a ray tracing dispatch.

The base for the computation of miss shader locations is pMissShaderBindingTable→deviceAddress, a device address passed into vkCmdTraceRaysKHR , or missShaderBindingOffset, a base offset passed into vkCmdTraceRaysNV .

The missIndex value is passed in as a parameter to traceNV() or traceRayEXT() calls made in the shaders. See Section 8.19 (Ray Tracing Functions) of the OpenGL Shading Language Specification for more details. In SPIR-V, this corresponds to the MissIndex parameter to the pipeline trace ray instructions.

For vkCmdTraceRaysKHR, the complete rule to compute a miss shader binding table record address in the pMissShaderBindingTable is:

pMissShaderBindingTable→deviceAddress + pMissShaderBindingTable→stride × missIndex

All data accessed must be less than pMissShaderBindingTable→size bytes from the base address.

For vkCmdTraceRaysNV, the offset and stride come from direct parameters, so the full rule to compute a miss shader binding table record address in the missShaderBindingTableBuffer is:

missShaderBindingOffset + missShaderBindingStride × missIndex

Callable Shaders

A callable shader is executed when requested by a ray tracing shader. Multiple callable shaders may be executed throughout a ray tracing dispatch.

The base for the computation of callable shader locations is pCallableShaderBindingTable→deviceAddress, a device address passed into vkCmdTraceRaysKHR , or callableShaderBindingOffset, a base offset passed into vkCmdTraceRaysNV .

The sbtRecordIndex value is passed in as a parameter to executeCallableNV() or executeCallableEXT() calls made in the shaders. See Section 8.19 (Ray Tracing Functions) of the OpenGL Shading Language Specification for more details. In SPIR-V, this corresponds to the SBTIndex parameter to the OpExecuteCallableNV or OpExecuteCallableKHR instruction.

For vkCmdTraceRaysKHR, the complete rule to compute a callable shader binding table record address in the pCallableShaderBindingTable is:

pCallableShaderBindingTable→deviceAddress + pCallableShaderBindingTable→stride × sbtRecordIndex

All data accessed must be less than pCallableShaderBindingTable→size bytes from the base address.

For vkCmdTraceRaysNV, the offset and stride come from direct parameters, so the full rule to compute a callable shader binding table record address in the callableShaderBindingTableBuffer is:

callableShaderBindingOffset + callableShaderBindingStride × sbtRecordIndex

Ray Tracing Pipeline Stack

Ray tracing pipelines have a potentially large set of shaders which may be invoked in various call chain combinations to perform ray tracing. To store parameters for a given shader execution, an implementation may use a stack of data in memory. This stack must be sized to the sum of the stack sizes of all shaders in any call chain executed by the application.

If the stack size is not set explicitly, the stack size for a pipeline is:

rayGenStackMax + min(1, maxPipelineRayRecursionDepth) × max(closestHitStackMax, missStackMax, intersectionStackMax + anyHitStackMax) + max(0, maxPipelineRayRecursionDepth-1) × max(closestHitStackMax, missStackMax) + 2 × callableStackMax

where rayGenStackMax, closestHitStackMax, missStackMax, anyHitStackMax, intersectionStackMax, and callableStackMax are the maximum stack values queried by the respective shader stages for any shaders in any shader groups defined by the pipeline.

This stack size is potentially significant, so an application may want to provide a more accurate stack size after pipeline compilation. The value that the application provides is the maximum value of the sum of all shaders in a call chain across all possible call chains, taking into account any application specific knowledge about the properties of the call chains.

For example, if an application has two types of closest hit and miss shaders that it can use but the first level of rays will only use the first kind (possibly reflection) and the second level will only use the second kind (occlusion or shadow ray, for example) then the application can compute the stack size by something similar to:

rayGenStack + max(closestHit1Stack, miss1Stack) + max(closestHit2Stack, miss2Stack)

This is guaranteed to be no larger than the default stack size computation which assumes that both call levels may be the larger of the two.

Ray Tracing Capture Replay

In a similar way to bufferDeviceAddressCaptureReplay, the rayTracingPipelineShaderGroupHandleCaptureReplay feature allows the querying of opaque data which can be used in a future replay.

During the capture phase, capture/replay tools are expected to query opaque data for shader group handle replay using vkGetRayTracingCaptureReplayShaderGroupHandlesKHR.

Providing the opaque data during replay, using VkRayTracingShaderGroupCreateInfoKHR::pShaderGroupCaptureReplayHandle at pipeline creation time, causes the implementation to generate identical shader group handles to those in the capture phase, allowing capture/replay tools to reuse previously recorded shader binding table buffer contents or to obtain the same handles by calling vkGetRayTracingCaptureReplayShaderGroupHandlesKHR again.

Ray Tracing Validation

Ray tracing validation can help root cause application issues and improve performance. Unlike existing validation layers, ray tracing validation performs checks at an implementation level, which helps identify potential problems that may not be caught by the layer.

By enabling the ray tracing validation feature, warnings and errors can be delivered straight from a ray tracing implementation to the application through a messenger callback registered with the implementation, where they can be processed through existing application-side debugging or logging systems.