Acceleration Structures
Acceleration Structures
Acceleration structures are data structures used by the implementation to efficiently manage scene geometry as it is traversed during a ray tracing query. The application is responsible for managing acceleration structure objects (see Acceleration Structures), including allocation, destruction, executing builds or updates, and synchronizing resources used during ray tracing queries.
There are two types of acceleration structures, top level acceleration structures and bottom level acceleration structures.
An acceleration structure is considered to be constructed if an acceleration structure build command or copy command has been executed with the given acceleration structure as the destination.
Geometry
Geometries refer to a triangle, sphere, linear swept sphere (LSS), or axis-aligned bounding box.
A triangle is a fundamental geometric primitive defined by three vertices in 3D space, forming a flat, planar surface.
An axis-aligned bounding box (AABB) is a rectangular box defined by two points (minimum and maximum corners) that encloses a 3D object or scene. Its faces are aligned with the coordinate axes, making intersection tests efficient for spatial partitioning and acceleration structures.
A sphere primitive is defined by a position and a radius.
The linear swept sphere (LSS) primitive is comprised of two sphere endcaps and a truncated cone midsection. The midsection is constructed so that it tangentially intersects with the endcaps. Two points, P0 and P1, and two radii, r0 and r1, fully describe the primitive.
The following figure shows an example of the LSS primitive composed of two sphere endcaps connected by a midsection. The solid non-dotted outline indicates the intersectable portion of the primitive.
Endcaps on LSS primitives are optional and are controlled by
VkAccelerationStructureGeometryLinearSweptSpheresDataNV::endCapsMode
.
The following figure shows an example of the LSS primitive without the
endcaps with only the midsection present.
A LSS geometry can be defined in multiple ways. If only the vertex and radius data are specified in VkAccelerationStructureGeometryLinearSweptSpheresDataNV without specifying the index data, LSS primitives are drawn in pairs of vertices. Each primitive i is defined by entries (i × 2, i × 2 + 1) in the vertex and radius buffers. For example, if a vertex buffer contains vertices A, B, C, D, E, F and G, (assuming each character represents a position vector) with corresponding radii as rA, rB, rC, rD, rE, rF and rG respectively, the LSS primitives drawn will be as shown below with G skipped because it does not have a corresponding vertex pair.
LSS primitives can be chained together by specifying an index buffer and indexing mode in the VkAccelerationStructureGeometryLinearSweptSpheresDataNV structure.
If the VkRayTracingLssIndexingModeNV::indexingMode
is set to
VK_RAY_TRACING_LSS_INDEXING_MODE_LIST_NV
, then the consecutive pair of
indices in the index buffer select the vertices that define the LSS chain.
For example, assuming the same vertex buffer as before, if the index buffer
contains indices [6, 5, 5, 4, 4, 3, 2, 1], the LSS primitives will be
chained as shown:
Note that due to the lack of a [3, 2] pair, there is a break in the chain and D is not connected to C.
If the VkRayTracingLssIndexingModeNV::indexingMode
is set to
VK_RAY_TRACING_LSS_INDEXING_MODE_SUCCESSIVE_NV
, then each LSS
primitive is defined by two successive positions and radii, (k, k\
1), where k is a single index in the index buffer. For example, if the index buffer contains indices [0, 1, 2, 4], the LSS primitives will be chained as shown below. Note that due to the absence of index 3 in the index buffer, there is a break in the chain and D is not connected to E.
Top Level Acceleration Structures
Opaque acceleration structure for an array of instances. The descriptor or device address referencing this is the starting point for traversal.
The top level acceleration structure takes a reference to any bottom level acceleration structure referenced by its instances. Those bottom level acceleration structure objects must be valid when the top level acceleration structure is accessed.
Bottom Level Acceleration Structures
Opaque acceleration structure for an array of geometries.
Acceleration Structure Update Rules
The API defines two types of operations to produce acceleration structures from geometry:
- A build operation is used to construct an acceleration structure.
- An update operation is used to modify an existing acceleration structure.
An update operation imposes certain constraints on the input, in exchange for considerably faster execution. When performing an update, the application is required to provide a full description of the acceleration structure, but is prohibited from changing anything other than instance definitions, transform matrices, and vertex or AABB positions. All other aspects of the description must exactly match the one from the original build.
More precisely, the application must not use an update operation to do any of the following:
- Change primitives or instances from active to inactive, or vice versa (as defined in Inactive Primitives and Instances).
- Change the index or vertex formats of triangle geometry.
- Change triangle geometry transform pointers from null to non-null or vice versa.
- Change the number of geometries or instances in the structure.
- Change the geometry flags for any geometry in the structure.
- Change the number of vertices or primitives for any geometry in the structure.
If the original acceleration structure was built using opacity micromaps and
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_OPACITY_MICROMAP_DATA_UPDATE_EXT
was set in flags
, the application must provide a micromap matching
the original micromap in structure with only opacity values updated.
The application is prohibited from changing anything other than the specific
opacity values assigned to the triangles.
More precisely, the application must not use an update operation to do any of the following:
- Remove micromaps or VkOpacityMicromapSpecialIndexEXT values from a geometry which previously had them, or vice versa.
- Change between use of VkOpacityMicromapSpecialIndexEXT values and explicit micro-map triangles.
- Change the subdivision level or format of the micromap triangle associated with any acceleration-structure triangle.
If the original acceleration structure was built using opacity micromaps and
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_OPACITY_MICROMAP_UPDATE_EXT
was
set in flags
, the application must provide a micromap to the update
operation.
If VkMicromapBuildSizesInfoEXT::discardable
is VK_FALSE
, a
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_OPACITY_MICROMAP_DATA_UPDATE_EXT
or VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_OPACITY_MICROMAP_UPDATE_EXT
operation transfers the reference in the acceleration structure to the new
micromap.
If the original acceleration structure was built using opacity micromaps and neither opacity micromap update flag is set the application must provide the original micromap to the update operation.
If the original acceleration structure was built using displacement
micromaps and
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_DISPLACEMENT_MICROMAP_UPDATE_NV
was set in flags
, the application must provide a displacement
micromap to the update operation.
If the original acceleration structure was built using displacement micromaps and the displacement micromap update flag is not set the application must provide the original micromap to the update operation.
Inactive Primitives and Instances
Acceleration structures allow the use of particular input values to signal inactive primitives or instances.
An inactive triangle is one for which the first (X) component of any vertex is NaN. If any other vertex component is NaN, and the first is not, the behavior is undefined:. If the vertex format does not have a NaN representation, then all triangles are considered active.
An inactive instance is one whose acceleration structure reference is 0
.
An inactive AABB is one for which the minimum X coordinate is NaN. If any other component is NaN, and the first is not, the behavior is undefined:.
An inactive LSS or sphere is one where any of the radius or position component is NaN.
In the above definitions, NaN
refers to any type of NaN.
Signaling, non-signaling, quiet, loud, or otherwise.
An inactive object is considered invisible to all rays, and should not be represented in the acceleration structure. Implementations should ensure that the presence of inactive objects does not seriously degrade traversal performance.
Inactive objects are counted in the auto-generated index sequences which are
provided to shaders via InstanceId
and PrimitiveId
SPIR-V
decorations.
This allows objects in the scene to change freely between the active and
inactive states, without affecting the layout of any arrays which are being
indexed using the ID values.
Any transition between the active and inactive states requires a full acceleration structure rebuild. Applications must not perform an acceleration structure update where an object is active in the source acceleration structure but would be inactive in the destination, or vice versa.
The active/inactive state of primitives must not be changed with
acceleration structure updates.
For chained LSS, using the
VK_RAY_TRACING_LSS_PRIMITIVE_END_CAPS_MODE_CHAINED_NV
mode, entire
chains must be either active or inactive.
If any chain contains both active and inactive primitives, the behavior is
undefined:.
Degenerate Primitives and Instances
Degenerate primitives and instances behave differently to inactive primitives and instances, and are defined as:
- triangles that have one or more vertices whose respective (X), (Y), (Z) components are identical, or have three vertices that have at least two of the (X), (Y), or (Z) components identical, therefore forming a line or point. Degenerate triangles do not generate any intersections.
- AABBs whose
minX
=maxX
,minY
=maxY
, andminZ
=maxZ
. Degenerate AABBs may invoke the intersection shader. - LSS primitives where both the radii are set to
0
. - sphere primitives whose radius is set to
0
. - instances that reference bottom level acceleration structures that
contain no active primitives.
When building an acceleration structure, implementations should treat
degenerate instances as though they are a point at the instance origin,
specified by VkAccelerationStructureInstanceKHR::
transform
.
Unlike inactive primitives and instances, degenerate primitives and instances may transition from the degenerate to the non-degenerate state, or vice versa, when performing an acceleration structure update.
If an acceleration structure is built without
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR
set in
VkAccelerationStructureInfoNV::flags
or
VkAccelerationStructureBuildGeometryInfoKHR::flags
, degenerate primitives may be discarded.
Primitives that are defined with the same index value for more than one
vertex can always be discarded.
Building Acceleration Structures
In addition to LSS primitives, simple sphere geometry is also supported. Spheres do not have an endcap mode. If an index buffer is present, each entry represents a single position and radius describing one sphere primitive. If no index buffer is provided, the vertex position and radius values are sequentially read from the corresponding buffers.
Copying Acceleration Structures
An additional command exists for copying acceleration structures without updating their contents. The acceleration structure object can be compacted in order to improve performance. Before copying, an application must query the size of the resulting acceleration structure.
Cluster Level Acceleration Structures
Acceleration structure build times in ray tracing applications with extensive geometry can be reduced by introducing alternative acceleration structure types that facilitate bottom-level acceleration structure construction using pre-generated primitive clusters, improving geometry reuse. This can be achieved by incorporating additional acceleration structure types:
- Cluster Level Acceleration Structure
- Cluster Template Acceleration Structure
- Cluster Level Bottom Level Acceleration Structure
Cluster Level Acceleration Structure (CLAS) is an intermediate acceleration structure constructed from triangles, which serves as a building block for Cluster Level Bottom Level Acceleration Structure. A CLAS shares similarities with a traditional bottom level acceleration structure but has several key distinctions. A CLAS can only contain a limited number of triangles and vertices. CLAS objects cannot be directly referenced in a top level acceleration structure, instead, they must be part of a Cluster Level Bottom Level Acceleration Structure. The geometry indices within a CLAS are local to it, potentially non-consecutive, and customizable per primitive. Each CLAS can also have a user-defined 32-bit ClusterID, which is accessible in the hit shaders. The vertex positions within a CLAS can be quantized by zeroing specific floating-point mantissa bits to optimize storage.
Cluster Template Acceleration Structure is a partially constructed CLAS designed for efficient instantiation into multiple CLAS objects. During a cluster template build, some pre-computation is performed independent of vertex positions, allowing reuse across multiple CLAS objects with different vertex data. A cluster template itself does not require vertex positions but it retains non-positional properties similar to a CLAS, which are then inherited during instantiation. A cluster template must be instantiated into a CLAS object to be usable.
Cluster Level Bottom Level Acceleration Structure is a new alternative to the existing bottom level acceleration structures, which is constructed using references to already built CLAS objects and is the only cluster acceleration structure that can be referenced in a top level acceleration structure.
Partitioned Top Level Acceleration Structures
Partitioned Top Level Acceleration Structures (PTLAS) allow efficient reuse of previously constructed sections of the top level acceleration structure by eliminating a full rebuild when only a few instances are modified. This reduces build times and supports handling a higher number of instances, making it more suitable for large and complex scenes.
PTLAS organizes instances into partitions, enabling a two-stage build process: first, it constructs an acceleration structure for each partition by grouping the instances within it, and second, it combines these partition structures into a single acceleration structure, similar to the current top-level acceleration structure.
To maintain compatibility, PTLAS behaves identically to the current top-level acceleration structure from the perspective of ray tracing shaders and pipelines.
PTLAS includes a unique global partition that operates independently of other partitions. Instances can be assigned to this global partition just like they would to regular partitions. The global partition is well-suited for frequently updated instances, such as animated characters. During the build process, instances in the global partition are treated as if they belong to individual partitions, without increasing the maximum partition count. However, instances in the global partition may still impact build performance. Once these instances become stable, they should be moved to a spatially optimized, non-global partition to lower build costs and minimize trace performance issues.
To handle large worlds requiring more precision than 32-bit floating-point
numbers offer, PTLAS offers efficient partition translation.
Typically, applications maintain precision by placing the world center near
the camera.
Partition translation allows an additional translation of instances during
construction without changing their stored transforms.
This method stores instance transforms relative to partitions, applying a
translation to achieve accurate world positions.
Higher precision is maintained using smaller floating-point numbers until
the structure is built.
World space coordinates can also be updated efficiently without rebuilding
the entire PTLAS.
Partition translation requires extra memory for untranslated instance
transforms and must be explicitly enabled with
VkPartitionedAccelerationStructureFlagsNV::enablePartitionTranslation
flag.
Host Acceleration Structure Operations
Implementations are also required to provide host implementations of the
acceleration structure operations if the
accelerationStructureHostCommands
feature is enabled:
- vkBuildAccelerationStructuresKHR corresponding to vkCmdBuildAccelerationStructuresKHR
- vkCopyAccelerationStructureKHR corresponding to vkCmdCopyAccelerationStructureKHR
- vkCopyAccelerationStructureToMemoryKHR corresponding to vkCmdCopyAccelerationStructureToMemoryKHR
- vkCopyMemoryToAccelerationStructureKHR corresponding to vkCmdCopyMemoryToAccelerationStructureKHR
- vkWriteAccelerationStructuresPropertiesKHR corresponding to vkCmdWriteAccelerationStructuresPropertiesKHR
These commands are functionally equivalent to their device counterparts, except that they are executed on the host timeline, rather than being enqueued into command buffers.
All acceleration structures used by the host commands must be bound to host-visible memory, and all input data for acceleration structure builds must be referenced using host addresses instead of device addresses. Applications are not required to map acceleration structure memory when using the host commands.
The vkBuildAccelerationStructuresKHR and vkCmdBuildAccelerationStructuresKHR may use different algorithms, and thus are not required to produce identical structures. The structures produced by these two commands may exhibit different memory footprints or traversal performance, but should strive to be similar where possible.
Apart from these details, the host and device operations are interchangeable. For example, an application can use vkBuildAccelerationStructuresKHR to build a structure, compact it on the device using vkCmdCopyAccelerationStructureKHR, and serialize the result using vkCopyAccelerationStructureToMemoryKHR.
For efficient execution, acceleration structures manipulated using these commands should always be bound to host cached memory, as the implementation may need to repeatedly read and write this memory during the execution of the command.