Synchronization and Cache Control

Synchronization of access to resources is primarily the responsibility of the application in Vulkan. The order of execution of commands with respect to the host and other commands on the device has few implicit guarantees, and needs to be explicitly specified. Memory caches and other optimizations are also explicitly managed, requiring that the flow of data through the system is largely under application control.

Whilst some implicit guarantees exist between commands, five explicit synchronization mechanisms are exposed by Vulkan:

Fences

Fences can be used to communicate to the host that execution of some task on the device has completed, controlling resource access between host and device.

Semaphores

Semaphores can be used to control resource access across multiple queues.

Events

Events provide a fine-grained synchronization primitive which can be signaled either within a command buffer or by the host, and can be waited upon within a command buffer or queried on the host. Events can be used to control resource access within a single queue.

Pipeline Barriers

Pipeline barriers also provide synchronization control within a command buffer, but at a single point, rather than with separate signal and wait operations. Pipeline barriers can be used to control resource access within a single queue.

Render Pass Objects

Render pass objects provide a synchronization framework for rendering tasks, built upon the concepts in this chapter. Many cases that would otherwise need an application to use other synchronization primitives can be expressed more efficiently as part of a render pass. Render pass objects can be used to control resource access within a single queue.

Execution and Memory Dependencies

An operation is an arbitrary amount of work to be executed on the host, a device, or an external entity such as a presentation engine. Synchronization commands introduce explicit execution dependencies, and memory dependencies between two sets of operations defined by the command’s two synchronization scopes.

The synchronization scopes define which other operations a synchronization command is able to create execution dependencies with. Any type of operation that is not in a synchronization command’s synchronization scopes will not be included in the resulting dependency. For example, for many synchronization commands, the synchronization scopes can be limited to just operations executing in specific pipeline stages, which allows other pipeline stages to be excluded from a dependency. Other scoping options are possible, depending on the particular command.

An execution dependency is a guarantee that for two sets of operations, the first set must happen-before the second set. If an operation happens-before another operation, then the first operation must complete before the second operation is initiated. More precisely:

Let Ops₁ and Ops₂ be separate sets of operations.
Let Sync be a synchronization command.
Let Scope_1st and Scope_2nd be the synchronization scopes of Sync.
Let ScopedOps₁ be the intersection of sets Ops₁ and Scope_1st.
Let ScopedOps₂ be the intersection of sets Ops₂ and Scope_2nd.
Submitting Ops₁, Sync and Ops₂ for execution, in that order, will result in execution dependency ExeDep between ScopedOps₁ and ScopedOps₂.
Execution dependency ExeDep guarantees that ScopedOps₁ happen-before ScopedOps₂.

An execution dependency chain is a sequence of execution dependencies that form a happens-before relation between the first dependency’s ScopedOps₁ and the final dependency’s ScopedOps₂. For each consecutive pair of execution dependencies, a chain exists if the intersection of Scope_2nd in the first dependency and Scope_1st in the second dependency is not an empty set. The formation of a single execution dependency from an execution dependency chain can be described by substituting the following in the description of execution dependencies:

Let Sync be a set of synchronization commands that generate an execution dependency chain.
Let Scope_1st be the first synchronization scope of the first command in Sync.
Let Scope_2nd be the second synchronization scope of the last command in Sync.

Execution dependencies alone are not sufficient to guarantee that values resulting from writes in one set of operations can be read from another set of operations.

Three additional types of operations are used to control memory access. Availability operations cause the values generated by specified memory write accesses to become available to a memory domain for future access. Any available value remains available until a subsequent write to the same memory location occurs (whether it is made available or not) or the memory is freed. Memory domain operations cause writes that are available to a source memory domain to become available to a destination memory domain (an example of this is making writes available to the host domain available to the device domain). Visibility operations cause values available to a memory domain to become visible to specified memory accesses.

Availability, visibility, memory domains, and memory domain operations are formally defined in the Availability and Visibility section of the Memory Model chapter. Which API operations perform each of these operations is defined in Availability, Visibility, and Domain Operations.

A memory dependency is an execution dependency which includes availability and visibility operations such that:

The first set of operations happens-before the availability operation.
The availability operation happens-before the visibility operation.
The visibility operation happens-before the second set of operations.

Once written values are made visible to a particular type of memory access, they can be read or written by that type of memory access. Most synchronization commands in Vulkan define a memory dependency.

The specific memory accesses that are made available and visible are defined by the access scopes of a memory dependency. Any type of access that is in a memory dependency’s first access scope and occurs in ScopedOps₁ is made available. Any type of access that is in a memory dependency’s second access scope and occurs in ScopedOps₂ has any available writes made visible to it. Any type of operation that is not in a synchronization command’s access scopes will not be included in the resulting dependency.

A memory dependency enforces availability and visibility of memory accesses and execution order between two sets of operations. Adding to the description of execution dependency chains:

Let MemOps₁ be the set of memory accesses performed by ScopedOps₁.
Let MemOps₂ be the set of memory accesses performed by ScopedOps₂.
Let AccessScope_1st be the first access scope of the first command in the Sync chain.
Let AccessScope_2nd be the second access scope of the last command in the Sync chain.
Let ScopedMemOps₁ be the intersection of sets MemOps₁ and AccessScope_1st.
Let ScopedMemOps₂ be the intersection of sets MemOps₂ and AccessScope_2nd.
Submitting Ops₁, Sync, and Ops₂ for execution, in that order, will result in a memory dependency MemDep between ScopedOps₁ and ScopedOps₂.
Memory dependency MemDep guarantees that:
- Memory writes in ScopedMemOps₁ are made available.
- Available memory writes, including those from ScopedMemOps₁, are made visible to ScopedMemOps₂.

Execution and memory dependencies are used to solve data hazards, i.e. to ensure that read and write operations occur in a well-defined order. Write-after-read hazards can be solved with just an execution dependency, but read-after-write and write-after-write hazards need appropriate memory dependencies to be included between them. If an application does not include dependencies to solve these hazards, the results and execution orders of memory accesses are undefined:.

Image Layout Transitions

Image subresources can be transitioned from one layout to another as part of a memory dependency (e.g. by using an image memory barrier). When a layout transition is specified in a memory dependency, it happens-after the availability operations in the memory dependency, and happens-before the visibility operations. Image layout transitions may perform read and write accesses on all memory bound to the image subresource range, so applications must ensure that all memory writes have been made available before a layout transition is executed. Available memory is automatically made visible to a layout transition, and writes performed by a layout transition are automatically made available.

Layout transitions always apply to a particular image subresource range, and specify both an old layout and new layout. The old layout must either be VK_IMAGE_LAYOUT_UNDEFINED, or match the current layout of the image subresource range. If the old layout matches the current layout of the image subresource range, the transition preserves the contents of that range. If the old layout is VK_IMAGE_LAYOUT_UNDEFINED, the contents of that range may be discarded.

Image layout transitions with VK_IMAGE_LAYOUT_UNDEFINED allow the implementation to discard the image subresource range, which can provide performance or power benefits. Tile-based architectures may be able to avoid flushing tile data to memory, and immediate style renderers may be able to achieve fast metadata clears to reinitialize frame buffer compression state, or similar.

If the contents of an attachment are not needed after a render pass completes, then applications should use VK_ATTACHMENT_STORE_OP_DONT_CARE.

As image layout transitions may perform read and write accesses on the memory bound to the image, if the image subresource affected by the layout transition is bound to peer memory for any device in the current device mask then the memory heap the bound memory comes from must support the VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT and VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT capabilities as returned by vkGetDeviceGroupPeerMemoryFeatures.

Applications must ensure that layout transitions happen-after all operations accessing the image with the old layout, and happen-before any operations that will access the image with the new layout. Layout transitions are potentially read/write operations, so not defining appropriate memory dependencies to guarantee this will result in a data race.

Image layout transitions interact with memory aliasing.

Layout transitions that are performed via image memory barriers execute in their entirety in submission order, relative to other image layout transitions submitted to the same queue, including those performed by render passes. In effect there is an implicit execution dependency from each such layout transition to all layout transitions previously submitted to the same queue.

The image layout of each image subresource of a depth/stencil image created with VK_IMAGE_CREATE_SAMPLE_LOCATIONS_COMPATIBLE_DEPTH_BIT_EXT is dependent on the last sample locations used to render to the image subresource as a depth/stencil attachment, thus when the image member of an image memory barrier is an image created with this flag the application can chain a VkSampleLocationsInfoEXT structure to the pNext chain of VkImageMemoryBarrier2 or VkImageMemoryBarrier to specify the sample locations to use during any image layout transition.

If the VkSampleLocationsInfoEXT structure does not match the sample location state last used to render to the image subresource range specified by subresourceRange, or if no VkSampleLocationsInfoEXT structure is present, then the contents of the given image subresource range becomes undefined: as if oldLayout would equal VK_IMAGE_LAYOUT_UNDEFINED.

Pipeline Stages

The work performed by an action command consists of multiple operations, which are performed as a sequence of logically independent steps known as pipeline stages. The exact pipeline stages executed depend on the particular command that is used, and current command buffer state when the command was recorded.

Operations performed by synchronization commands (e.g. availability and visibility operations) are not executed by a defined pipeline stage. However other commands can still synchronize with them by using the synchronization scopes to create a dependency chain.

Execution of operations across pipeline stages must adhere to implicit ordering guarantees, particularly including pipeline stage order. Otherwise, execution across pipeline stages may overlap or execute out of order with regards to other stages, unless otherwise enforced by an execution dependency.

Several of the synchronization commands include pipeline stage parameters, restricting the synchronization scopes for that command to just those stages. This allows fine grained control over the exact execution dependencies and accesses performed by action commands. Implementations should use these pipeline stages to avoid unnecessary stalls or cache flushing.

If a synchronization command includes a source stage mask, its first synchronization scope only includes execution of the pipeline stages specified in that mask and any logically earlier stages. Its first access scope only includes memory accesses performed by pipeline stages explicitly specified in the source stage mask.

If a synchronization command includes a destination stage mask, its second synchronization scope only includes execution of the pipeline stages specified in that mask and any logically later stages. Its second access scope only includes memory accesses performed by pipeline stages explicitly specified in the destination stage mask.

Note that access scopes do not interact with the logically earlier or later stages for either scope - only the stages the application specifies are considered part of each access scope.

Certain pipeline stages are only available on queues that support a particular set of operations. The following table lists, for each pipeline stage flag, which queue capability flag must be supported by the queue. When multiple flags are enumerated in the second column of the table, it means that the pipeline stage is supported on the queue if it supports any of the listed capability flags. For further details on queue capabilities see Physical Device Enumeration and Queues.

Table 3. Supported Pipeline Stage Flags
Pipeline stage flag	Required queue capability flag
VK_PIPELINE_STAGE_2_NONE	None required
VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT	None required
VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_VERTEX_INPUT_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_VERTEX_SHADER_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_TESSELLATION_CONTROL_SHADER_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_TESSELLATION_EVALUATION_SHADER_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_GEOMETRY_SHADER_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT	VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_ALL_TRANSFER_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT	None required
VK_PIPELINE_STAGE_2_HOST_BIT	None required
VK_PIPELINE_STAGE_2_ALL_GRAPHICS_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT	None required
VK_PIPELINE_STAGE_2_COPY_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_RESOLVE_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_BLIT_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_CLEAR_BIT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_INDEX_INPUT_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_VERTEX_ATTRIBUTE_INPUT_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_PRE_RASTERIZATION_SHADERS_BIT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR	VK_QUEUE_VIDEO_DECODE_BIT_KHR
VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR	VK_QUEUE_VIDEO_ENCODE_BIT_KHR
VK_PIPELINE_STAGE_2_TRANSFORM_FEEDBACK_BIT_EXT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_CONDITIONAL_RENDERING_BIT_EXT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_COMMAND_PREPROCESS_BIT_EXT	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_ACCELERATION_STRUCTURE_BUILD_BIT_KHR	VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_RAY_TRACING_SHADER_BIT_KHR	VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_FRAGMENT_DENSITY_PROCESS_BIT_EXT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_TASK_SHADER_BIT_EXT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_MESH_SHADER_BIT_EXT	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_SUBPASS_SHADER_BIT_HUAWEI	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_INVOCATION_MASK_BIT_HUAWEI	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_ACCELERATION_STRUCTURE_COPY_BIT_KHR	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT
VK_PIPELINE_STAGE_2_MICROMAP_BUILD_BIT_EXT	VK_QUEUE_COMPUTE_BIT
VK_PIPELINE_STAGE_2_CLUSTER_CULLING_SHADER_BIT_HUAWEI	VK_QUEUE_GRAPHICS_BIT
VK_PIPELINE_STAGE_2_OPTICAL_FLOW_BIT_NV	VK_QUEUE_OPTICAL_FLOW_BIT_NV
VK_PIPELINE_STAGE_2_CONVERT_COOPERATIVE_VECTOR_MATRIX_BIT_NV	VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT or VK_QUEUE_TRANSFER_BIT

Pipeline stages that execute as a result of a command logically complete execution in a specific order, such that completion of a logically later pipeline stage must not happen-before completion of a logically earlier stage. This means that including any stage in the source stage mask for a particular synchronization command also implies that any logically earlier stages are included in Scope_1st for that command.

Similarly, initiation of a logically earlier pipeline stage must not happen-after initiation of a logically later pipeline stage. Including any given stage in the destination stage mask for a particular synchronization command also implies that any logically later stages are included in Scope_2nd for that command.

Implementations may not support synchronization at every pipeline stage for every synchronization operation. If a pipeline stage that an implementation does not support synchronization for appears in a source stage mask, it may substitute any logically later stage in its place for the first synchronization scope. If a pipeline stage that an implementation does not support synchronization for appears in a destination stage mask, it may substitute any logically earlier stage in its place for the second synchronization scope.

For example, if an implementation is unable to signal an event immediately after vertex shader execution is complete, it may instead signal the event after color attachment output has completed.

If an implementation makes such a substitution, it must not affect the semantics of execution or memory dependencies or image and buffer memory barriers.

:anchor{id="synchronization-pipeline-graphics"}

Graphics pipelines are executable on queues supporting VK_QUEUE_GRAPHICS_BIT. Stages executed by graphics pipelines can only be specified in commands recorded for queues supporting VK_QUEUE_GRAPHICS_BIT.

The graphics primitive pipeline executes the following stages, with the logical ordering of the stages matching the order specified here:

VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT
VK_PIPELINE_STAGE_2_INDEX_INPUT_BIT
VK_PIPELINE_STAGE_2_VERTEX_ATTRIBUTE_INPUT_BIT
VK_PIPELINE_STAGE_2_VERTEX_SHADER_BIT
VK_PIPELINE_STAGE_2_TESSELLATION_CONTROL_SHADER_BIT
VK_PIPELINE_STAGE_2_TESSELLATION_EVALUATION_SHADER_BIT
VK_PIPELINE_STAGE_2_GEOMETRY_SHADER_BIT
VK_PIPELINE_STAGE_2_TRANSFORM_FEEDBACK_BIT_EXT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR
VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT

The graphics mesh pipeline executes the following stages, with the logical ordering of the stages matching the order specified here:

VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT
VK_PIPELINE_STAGE_2_TASK_SHADER_BIT_EXT
VK_PIPELINE_STAGE_2_MESH_SHADER_BIT_EXT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR
VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT

For the compute pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT

For the subpass shading pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_SUBPASS_SHADER_BIT_HUAWEI

For graphics pipeline commands executing in a render pass with a fragment density map attachment, the following pipeline stage where the fragment density map read happens has no particular order relative to the other stages, except that it is logically earlier than VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT:

VK_PIPELINE_STAGE_FRAGMENT_DENSITY_PROCESS_BIT_EXT
VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT

The conditional rendering stage is formally part of both the graphics, and the compute pipeline. The pipeline stage where the predicate read happens has unspecified order relative to other stages of these pipelines:

VK_PIPELINE_STAGE_CONDITIONAL_RENDERING_BIT_EXT

For the transfer pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_TRANSFER_BIT

For host operations, only one pipeline stage occurs, so no order is guaranteed:

VK_PIPELINE_STAGE_2_HOST_BIT

For the command preprocessing pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_COMMAND_PREPROCESS_BIT_EXT

For acceleration structure build operations, only one pipeline stage occurs, so no order is guaranteed:

VK_PIPELINE_STAGE_2_ACCELERATION_STRUCTURE_BUILD_BIT_KHR

For acceleration structure copy operations, only one pipeline stage occurs, so no order is guaranteed:

VK_PIPELINE_STAGE_2_ACCELERATION_STRUCTURE_COPY_BIT_KHR

For opacity micromap build operations, only one pipeline stage occurs, so no order is guaranteed:

VK_PIPELINE_STAGE_2_MICROMAP_BUILD_BIT_EXT

For the ray tracing pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT
VK_PIPELINE_STAGE_2_RAY_TRACING_SHADER_BIT_KHR

For the video decode pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR

For the video encode pipeline, the following stages occur in this order:

VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR

Access Types

Memory in Vulkan can be accessed from within shader invocations and via some fixed-function stages of the pipeline. The access type is a function of the descriptor type used, or how a fixed-function stage accesses memory.

Some synchronization commands take sets of access types as parameters to define the access scopes of a memory dependency. If a synchronization command includes a source access mask, its first access scope only includes accesses via the access types specified in that mask. Similarly, if a synchronization command includes a destination access mask, its second access scope only includes accesses via the access types specified in that mask.

An application can link a VkMemoryBarrierAccessFlags3KHR structure in the pNext chain of VkMemoryBarrier2, VkBufferMemoryBarrier2, or VkImageMemoryBarrier2 to provide additional access flags beyond those available in VkAccessFlagBits2.

When a VkMemoryBarrierAccessFlags3KHR structure is linked in the pNext field of VkMemoryBarrier2, VkBufferMemoryBarrier2, or VkImageMemoryBarrier2, the flags specified in the srcAccessMask3 and dstAccessMask3 fields are considered in addition to the flags in the srcAccessMask and dstAccessMask fields, respectively, to allow up to 128 total access types to be specified for the first or second access scope.

When VkAccessFlagBits3KHR and VkAccessFlagBits2 are used together, the two sets of 64 flags bits are combined together into 128 flag bits (effectively OR’ing them together). This is different from VkAccessFlagBits2 and VkAccessFlagBits, where the 64 bit VkAccessFlagBits2 extends and replaces the 32 bit VkAccessFlagBits.

If a memory object does not have the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT property, then vkFlushMappedMemoryRanges must be called in order to guarantee that writes to the memory object from the host are made available to the host domain, where they can be further made available to the device domain via a domain operation. Similarly, vkInvalidateMappedMemoryRanges must be called to guarantee that writes which are available to the host domain are made visible to host operations.

If the memory object does have the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT property flag, writes to the memory object from the host are automatically made available to the host domain. Similarly, writes made available to the host domain are automatically made visible to the host.

Queue submission commands automatically perform a domain operation from host to device for all writes performed before the command executes, so in most cases an explicit memory barrier is not needed for this case. In the few circumstances where a submit does not occur between the host write and the device read access, writes can be made available by using an explicit memory barrier.

Framebuffer Region Dependencies

Pipeline stages that operate on, or with respect to, the framebuffer are collectively the framebuffer-space pipeline stages. These stages are:

VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT

For commands recorded where the per-tile execution model is enabled, the VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT and VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT are included as framebuffer-space pipeline stages.

For these pipeline stages, an execution or memory dependency from the first set of operations to the second set can either be a single framebuffer-global dependency, or split into multiple framebuffer-local dependencies. A dependency with non-framebuffer-space pipeline stages is neither framebuffer-global nor framebuffer-local.

A framebuffer region is a subset of the entire framebuffer, and can either be:

A sample region, which is set of sample (x, y, layer, sample) coordinates that is a subset of the entire framebuffer, or
A fragment region, which is a set of fragment (x, y, layer) coordinates that is a subset of the entire framebuffer.

Both synchronization scopes of a framebuffer-local dependency include only the operations performed within corresponding framebuffer regions (as defined below). No ordering guarantees are made between different framebuffer regions for a framebuffer-local dependency.

Both synchronization scopes of a framebuffer-global dependency include operations on all framebuffer-regions.

If the first synchronization scope includes operations on pixels/fragments with N samples and the second synchronization scope includes operations on pixels/fragments with M samples, where N does not equal M, then a framebuffer region containing all samples at a given (x, y, layer) coordinate in the first synchronization scope corresponds to a region containing all samples at the same coordinate in the second synchronization scope. In other words, the framebuffer region is a fragment region and it is a pixel granularity dependency. If N equals M, and if the VkSubpassDescription::flags does not specify the VK_SUBPASS_DESCRIPTION_FRAGMENT_REGION_BIT_QCOM flag, then a framebuffer region containing a single (x, y, layer, sample) coordinate in the first synchronization scope corresponds to a region containing the same sample at the same coordinate in the second synchronization scope. In other words, the framebuffer region is a sample region and it is a sample granularity dependency.

If the pipeline performing the operation was created with VK_PIPELINE_COLOR_BLEND_STATE_CREATE_RASTERIZATION_ORDER_ATTACHMENT_ACCESS_BIT_EXT, VK_PIPELINE_DEPTH_STENCIL_STATE_CREATE_RASTERIZATION_ORDER_ATTACHMENT_DEPTH_ACCESS_BIT_EXT, or VK_PIPELINE_DEPTH_STENCIL_STATE_CREATE_RASTERIZATION_ORDER_ATTACHMENT_STENCIL_ACCESS_BIT_EXT, the framebuffer region is a fragment region and it is a pixel granularity dependency.

For commands recorded within a render pass that enables tile shading, the framebuffer region is a tile region and it is a tile granularity dependency.

For commands recorded within a render pass instance that enables tile shading, the fragment shader invocations for a given tile will be grouped together, but the tiles do not run in any particular order, and the fragment shader invocations within a tile do not run in any particular order. Otherwise, fragment shader invocations are not specified to run in any particular groupings, the size of a framebuffer region is implementation-dependent, not known to the application, and must be assumed to be no larger than specified above.

Practically, the pixel vs. sample granularity dependency means that if an input attachment has a different number of samples than the pipeline’s rasterizationSamples, then a fragment can access any sample in the input attachment’s pixel even if it only uses framebuffer-local dependencies. If the input attachment has the same number of samples, then the fragment can only access the covered samples in its input SampleMask (i.e. the fragment operations happen-after a framebuffer-local dependency for each sample the fragment covers). To access samples that are not covered, either the VkSubpassDescription::flags

VK_SUBPASS_DESCRIPTION_FRAGMENT_REGION_BIT_QCOM flag is required, or a framebuffer-global dependency is required.

For a tile granularity dependency, a fragment shader can use tile attachment variables to access any pixel or sample within the active tile for color, depth/stencil or input attachments even if only framebuffer-local dependencies are used.

If a synchronization command includes a dependencyFlags parameter, and specifies the VK_DEPENDENCY_BY_REGION_BIT flag, then it defines framebuffer-local dependencies for the framebuffer-space pipeline stages in that synchronization command, for all framebuffer regions. If no dependencyFlags parameter is included, or the VK_DEPENDENCY_BY_REGION_BIT flag is not specified, then a framebuffer-global dependency is specified for those stages. The VK_DEPENDENCY_BY_REGION_BIT flag does not affect the dependencies between non-framebuffer-space pipeline stages, nor does it affect the dependencies between framebuffer-space and non-framebuffer-space pipeline stages.

Framebuffer-local dependencies are more efficient for most architectures; particularly tile-based architectures - which can keep framebuffer-regions entirely in on-chip registers and thus avoid external bandwidth across such a dependency. Including a framebuffer-global dependency in your rendering will usually force all implementations to flush data to memory, or to a higher level cache, breaking any potential locality optimizations.

View-Local Dependencies

In a render pass instance that has multiview enabled, dependencies can be either view-local or view-global.

A view-local dependency only includes operations from a single source view from the source subpass in the first synchronization scope, and only includes operations from a single destination view from the destination subpass in the second synchronization scope. A view-global dependency includes all views in the view mask of the source and destination subpasses in the corresponding synchronization scopes.

If a synchronization command includes a dependencyFlags parameter and specifies the VK_DEPENDENCY_VIEW_LOCAL_BIT flag, then it defines view-local dependencies for that synchronization command, for all views. If no dependencyFlags parameter is included or the VK_DEPENDENCY_VIEW_LOCAL_BIT flag is not specified, then a view-global dependency is specified.

Device-Local Dependencies

Dependencies can be either device-local or non-device-local. A device-local dependency acts as multiple separate dependencies, one for each physical device that executes the synchronization command, where each dependency only includes operations from that physical device in both synchronization scopes. A non-device-local dependency is a single dependency where both synchronization scopes include operations from all physical devices that participate in the synchronization command. For subpass dependencies, all physical devices in the VkDeviceGroupRenderPassBeginInfo::deviceMask participate in the dependency, and for pipeline barriers all physical devices that are set in the command buffer’s current device mask participate in the dependency.

If a synchronization command includes a dependencyFlags parameter and specifies the VK_DEPENDENCY_DEVICE_GROUP_BIT flag, then it defines a non-device-local dependency for that synchronization command. If no dependencyFlags parameter is included or the VK_DEPENDENCY_DEVICE_GROUP_BIT flag is not specified, then it defines device-local dependencies for that synchronization command, for all participating physical devices.

Semaphore and event dependencies are device-local and only execute on the one physical device that performs the dependency.

Implicit Synchronization Guarantees

A small number of implicit ordering guarantees are provided by Vulkan, ensuring that the order in which commands are submitted is meaningful, and avoiding unnecessary complexity in common operations.

Submission order is a fundamental ordering in Vulkan, giving meaning to the order in which action and synchronization commands are recorded and submitted to a single queue. Explicit and implicit ordering guarantees between commands in Vulkan all work on the premise that this ordering is meaningful. This order does not itself define any execution or memory dependencies; synchronization commands and other orderings within the API use this ordering to define their scopes.

Submission order for any given set of commands is based on the order in which they were recorded to command buffers and then submitted. This order is determined as follows:

The initial order is determined by the order in which vkQueueSubmit and vkQueueSubmit2 commands are executed on the host, for a single queue, from first to last.
The order in which VkSubmitInfo structures are specified in the pSubmits parameter of vkQueueSubmit, or in which VkSubmitInfo2 structures are specified in the pSubmits parameter of vkQueueSubmit2, from lowest index to highest.
The order in which command buffers are specified in the pCommandBuffers member of VkSubmitInfo or VkSubmitInfo2 from lowest index to highest.
The order in which commands outside of a render pass were recorded to a command buffer on the host, from first to last.
The order in which commands inside a single subpass were recorded to a command buffer on the host, from first to last.

When using a render pass object with multiple subpasses, commands in different subpasses have no defined submission order relative to each other, regardless of the order in which the subpasses were recorded. Commands within a subpass are still ordered relative to other commands in the same subpass, and those outside of the render pass.

State commands do not execute any operations on the device, instead they set the state of the command buffer when they execute on the host, in the order that they are recorded. Action commands consume the current state of the command buffer when they are recorded, and will execute state changes on the device as required to match the recorded state.

The order of primitives passing through the graphics pipeline and image layout transitions as part of an image memory barrier provide additional guarantees based on submission order.

Execution of pipeline stages within a given command also has a loose ordering, dependent only on a single command.

Signal operation order is a fundamental ordering in Vulkan, giving meaning to the order in which semaphore and fence signal operations occur when submitted to a single queue. The signal operation order for queue operations is determined as follows:

The initial order is determined by the order in which vkQueueSubmit and vkQueueSubmit2 commands are executed on the host, for a single queue, from first to last.
The order in which VkSubmitInfo structures are specified in the pSubmits parameter of vkQueueSubmit, or in which VkSubmitInfo2 structures are specified in the pSubmits parameter of vkQueueSubmit2, from lowest index to highest.
The fence signal operation defined by the fence parameter of a vkQueueSubmit or vkQueueSubmit2 or vkQueueBindSparse command is ordered after all semaphore signal operations defined by that command.

Semaphore signal operations defined by a single VkSubmitInfo or VkSubmitInfo2 or VkBindSparseInfo structure are unordered with respect to other semaphore signal operations defined within the same structure.

The vkSignalSemaphore command does not execute on a queue but instead performs the signal operation from the host. The semaphore signal operation defined by executing a vkSignalSemaphore command happens-after the vkSignalSemaphore command is invoked and happens-before the command returns.

When signaling timeline semaphores, it is the responsibility of the application to ensure that they are ordered such that the semaphore value is strictly increasing. Because the first synchronization scope for a semaphore signal operation contains all semaphore signal operations which occur earlier in submission order, all semaphore signal operations contained in any given batch are guaranteed to happen-after all semaphore signal operations contained in any previous batches. However, no ordering guarantee is provided between the semaphore signal operations defined within a single batch. This, combined with the requirement that timeline semaphore values strictly increase, means that it is invalid to signal the same timeline semaphore twice within a single batch.

If an application wishes to ensure that some semaphore signal operation happens-after some other semaphore signal operation, it can submit a separate batch containing only semaphore signal operations, which will happen-after the semaphore signal operations in any earlier batches.

When signaling a semaphore from the host, the only ordering guarantee is that the signal operation happens-after when vkSignalSemaphore is called and happens-before it returns. Therefore, it is invalid to call vkSignalSemaphore while there are any outstanding signal operations on that semaphore from any queue submissions unless those queue submissions have some dependency which ensures that they happen-after the host signal operation. One example of this would be if the pending signal operation is, itself, waiting on the same semaphore at a lower value and the call to vkSignalSemaphore signals that lower value. Furthermore, if there are two or more processes or threads signaling the same timeline semaphore from the host, the application must ensure that the vkSignalSemaphore with the lower semaphore value returns before vkSignalSemaphore is called with the higher value.

Fences

When a fence is submitted to a queue as part of a queue submission command, it defines a memory dependency on the batches that were submitted as part of that command, and defines a fence signal operation which sets the fence to the signaled state.

The first synchronization scope includes every batch submitted in the same queue submission command. Fence signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 additionally include in the first synchronization scope all commands that occur earlier in submission order. Fence signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 or vkQueueBindSparse additionally include in the first synchronization scope any semaphore and fence signal operations that occur earlier in signal operation order.

The second synchronization scope only includes the fence signal operation.

The first access scope includes all memory access performed by the device.

The second access scope is empty.

An execution dependency is defined by waiting for a fence to become signaled, either via vkWaitForFences or by polling on vkGetFenceStatus.

The first synchronization scope includes only the fence signal operation.

The second synchronization scope includes the host operations of vkWaitForFences or vkGetFenceStatus indicating that the fence has become signaled.

Signaling a fence and waiting on the host does not guarantee that the results of memory accesses will be visible to the host, as the access scope of a memory dependency defined by a fence only includes device access. A memory barrier or other memory dependency must be used to guarantee this. See the description of host access types for more information.

Alternate Methods to Signal Fences

Besides submitting a fence to a queue as part of a queue submission command, a fence may also be signaled when a particular event occurs on a device or display.

Importing Fence Payloads

Applications can import a fence payload into an existing fence using an external fence handle. The effects of the import operation will be either temporary or permanent, as specified by the application. If the import is temporary, the fence will be restored to its permanent state the next time that fence is passed to vkResetFences.

Restoring a fence to its prior permanent payload is a distinct operation from resetting a fence payload. See vkResetFences for more detail.

Performing a subsequent temporary import on a fence before resetting it has no effect on this requirement; the next unsignal of the fence must still restore its last permanent state. A permanent payload import behaves as if the target fence was destroyed, and a new fence was created with the same handle but the imported payload. Because importing a fence payload temporarily or permanently detaches the existing payload from a fence, similar usage restrictions to those applied to vkDestroyFence are applied to any command that imports a fence payload. Which of these import types is used is referred to as the import operation’s permanence. Each handle type supports either one or both types of permanence.

The implementation must perform the import operation by either referencing or copying the payload referred to by the specified external fence handle, depending on the handle’s type. The import method used is referred to as the handle type’s transference. When using handle types with reference transference, importing a payload to a fence adds the fence to the set of all fences sharing that payload. This set includes the fence from which the payload was exported. Fence signaling, waiting, and resetting operations performed on any fence in the set must behave as if the set were a single fence. Importing a payload using handle types with copy transference creates a duplicate copy of the payload at the time of import, but makes no further reference to it. Fence signaling, waiting, and resetting operations performed on the target of copy imports must not affect any other fence or payload.

Export operations have the same transference as the specified handle type’s import operations. Additionally, exporting a fence payload to a handle with copy transference has the same side effects on the source fence’s payload as executing a fence reset operation. If the fence was using a temporarily imported payload, the fence’s prior permanent payload will be restored.

The tables Handle Types Supported by VkImportFenceWin32HandleInfoKHR and Handle Types Supported by VkImportFenceFdInfoKHR define the permanence and transference of each handle type.

External synchronization allows implementations to modify an object’s internal state, i.e. payload, without internal synchronization. However, for fences sharing a payload across processes, satisfying the external synchronization requirements of VkFence parameters as if all fences in the set were the same object is sometimes infeasible. Satisfying valid usage constraints on the state of a fence would similarly require impractical coordination or levels of trust between processes. Therefore, these constraints only apply to a specific fence handle, not to its payload. For distinct fence objects which share a payload:

If multiple commands which queue a signal operation, or which unsignal a fence, are called concurrently, behavior will be as if the commands were called in an arbitrary sequential order.
If a queue submission command is called with a fence that is sharing a payload, and the payload is already associated with another queue command that has not yet completed execution, either one or both of the commands will cause the fence to become signaled when they complete execution.
If a fence payload is reset while it is associated with a queue command that has not yet completed execution, the payload will become unsignaled, but may become signaled again when the command completes execution.
In the preceding cases, any of the devices associated with the fences sharing the payload may be lost, or any of the queue submission or fence reset commands may return VK_ERROR_INITIALIZATION_FAILED.

Other than these non-deterministic results, behavior is well defined. In particular:

The implementation must not crash or enter an internally inconsistent state where future valid Vulkan commands might cause undefined: results,
Timeouts on future wait commands on fences sharing the payload must be effective.

These rules allow processes to synchronize access to shared memory without trusting each other. However, such processes must still be cautious not to use the shared fence for more than synchronizing access to the shared memory. For example, a process should not use a fence with shared payload to tell when commands it submitted to a queue have completed and objects used by those commands may be destroyed, since the other process can accidentally or maliciously cause the fence to signal before the commands actually complete.

When a fence is using an imported payload, its VkExportFenceCreateInfo::handleTypes value is specified when creating the fence from which the payload was exported, rather than specified when creating the fence. Additionally, VkExternalFenceProperties::exportFromImportedHandleTypes restricts which handle types can be exported from such a fence based on the specific handle type used to import the current payload. Passing a fence to vkAcquireNextImageKHR is equivalent to temporarily importing a fence payload to that fence.

Because the exportable handle types of an imported fence correspond to its current imported payload, and vkAcquireNextImageKHR behaves the same as a temporary import operation for which the source fence is opaque to the application, applications have no way of determining whether any external handle types can be exported from a fence in this state. Therefore, applications must not attempt to export handles from fences using a temporarily imported payload from vkAcquireNextImageKHR.

When importing a fence payload, it is the responsibility of the application to ensure the external handles meet all valid usage requirements. However, implementations must perform sufficient validation of external handles to ensure that the operation results in a valid fence which will not cause program termination, device loss, queue stalls, host thread stalls, or corruption of other resources when used as allowed according to its import parameters. If the external handle provided does not meet these requirements, the implementation must fail the fence payload import operation with the error code VK_ERROR_INVALID_EXTERNAL_HANDLE.

Semaphores

Semaphore Signaling

When a batch is submitted to a queue via a queue submission, and it includes semaphores to be signaled, it defines a memory dependency on the batch, and defines semaphore signal operations which set the semaphores to the signaled state.

In case of semaphores created with a VkSemaphoreType of VK_SEMAPHORE_TYPE_TIMELINE the semaphore is considered signaled with respect to the counter value set to be signaled as specified in VkTimelineSemaphoreSubmitInfo or VkSemaphoreSignalInfo.

The first synchronization scope includes every command submitted in the same batch. In the case of vkQueueSubmit2, the first synchronization scope is limited to the pipeline stage specified by VkSemaphoreSubmitInfo::stageMask. Semaphore signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 additionally include all commands that occur earlier in submission order. Semaphore signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 or vkQueueBindSparse additionally include in the first synchronization scope any semaphore and fence signal operations that occur earlier in signal operation order.

The second synchronization scope includes only the semaphore signal operation.

The first access scope includes all memory access performed by the device.

The second access scope is empty.

Semaphore Waiting

When a batch is submitted to a queue via a queue submission, and it includes semaphores to be waited on, it defines a memory dependency between prior semaphore signal operations and the batch, and defines semaphore wait operations.

Such semaphore wait operations set the semaphores created with a VkSemaphoreType of VK_SEMAPHORE_TYPE_BINARY to the unsignaled state. In case of semaphores created with a VkSemaphoreType of VK_SEMAPHORE_TYPE_TIMELINE a prior semaphore signal operation defines a memory dependency with a semaphore wait operation if the value the semaphore is signaled with is greater than or equal to the value the semaphore is waited with, thus the semaphore will continue to be considered signaled with respect to the counter value waited on as specified in VkTimelineSemaphoreSubmitInfo.

The first synchronization scope includes one semaphore signal operation for each semaphore waited on by this batch. The specific signal operation waited on for each semaphore must meet the following criteria:

for binary semaphores, the signal operation is either earlier in submission order on the same queue, or is submitted by a command whose host operation happens-before this batch is submitted on the host
for binary semaphores, no wait operation exists that happens-after the signal operation and happens-before this wait operation
the signal operation is not guaranteed to happen-after the semaphore wait operation in this batch
for timeline semaphores, the signal value is greater than or equal to the wait value

If multiple semaphore signal operations meet these criteria, any of those operations may be included in the first synchronization scope. When waiting on a binary semaphore, applications must ensure that exactly one semaphore signal operation meets these criteria.

The second synchronization scope includes every command submitted in the same batch. In the case of vkQueueSubmit, the second synchronization scope is limited to operations on the pipeline stages determined by the destination stage mask specified by the corresponding element of pWaitDstStageMask. In the case of vkQueueSubmit2, the second synchronization scope is limited to the pipeline stage specified by VkSemaphoreSubmitInfo::stageMask. Also, in the case of either vkQueueSubmit2 or vkQueueSubmit, the second synchronization scope additionally includes all commands that occur later in submission order.

The first access scope is empty.

The second access scope includes all memory access performed by the device.

The semaphore wait operation happens-after the first set of operations in the execution dependency, and happens-before the second set of operations in the execution dependency.

Unlike timeline semaphores, fences or events, waiting for a binary semaphore also unsignals that semaphore when the wait completes. Applications must ensure that between two such wait operations, the semaphore is signaled again, with execution dependencies used to ensure these occur in order. Binary semaphore waits and signals should thus occur in discrete 1:1 pairs.

A common scenario for using pWaitDstStageMask with values other than VK_PIPELINE_STAGE_ALL_COMMANDS_BIT is when synchronizing a window system presentation operation against subsequent command buffers which render the next frame. In this case, a presentation image must not be overwritten until the presentation operation completes, but other pipeline stages can execute without waiting. A mask of VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT prevents subsequent color attachment writes from executing until the semaphore signals. Some implementations may be able to execute transfer operations and/or pre-rasterization work before the semaphore is signaled.

If an image layout transition needs to be performed on a presentable image before it is used in a framebuffer, that can be performed as the first operation submitted to the queue after acquiring the image, and should not prevent other work from overlapping with the presentation operation. For example, a VkImageMemoryBarrier could use:

srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
srcAccessMask = 0
dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT.
oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

Alternatively, oldLayout can be VK_IMAGE_LAYOUT_UNDEFINED, if the image’s contents need not be preserved.

This barrier accomplishes a dependency chain between previous presentation operations and subsequent color attachment output operations, with the layout transition performed in between, and does not introduce a dependency between previous work and any pre-rasterization shader stages. More precisely, the semaphore signals after the presentation operation completes, the semaphore wait stalls the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT stage, and there is a dependency from that same stage to itself with the layout transition performed in between.

Semaphore State Requirements for Wait Operations

Before waiting on a semaphore, the application must ensure the semaphore is in a valid state for a wait operation. Specifically, when a semaphore wait operation is submitted to a queue:

A binary semaphore must be signaled, or have an associated semaphore signal operation that is pending execution.
Any semaphore signal operations on which the pending binary semaphore signal operation depends must also be completed or pending execution.
There must be no other queue waiting on the same binary semaphore when the operation executes.

Host Operations on Semaphores

In addition to semaphore signal operations and semaphore wait operations submitted to device queues, timeline semaphores support the following host operations:

Query the current counter value of the semaphore using the vkGetSemaphoreCounterValue command.
Wait for a set of semaphores to reach particular counter values using the vkWaitSemaphores command.
Signal the semaphore with a particular counter value from the host using the vkSignalSemaphore command.

Importing Semaphore Payloads

Applications can import a semaphore payload into an existing semaphore using an external semaphore handle. The effects of the import operation will be either temporary or permanent, as specified by the application. If the import is temporary, the implementation must restore the semaphore to its prior permanent state after submitting the next semaphore wait operation. Performing a subsequent temporary import on a semaphore before performing a semaphore wait has no effect on this requirement; the next wait submitted on the semaphore must still restore its last permanent state. A permanent payload import behaves as if the target semaphore was destroyed, and a new semaphore was created with the same handle but the imported payload. Because importing a semaphore payload temporarily or permanently detaches the existing payload from a semaphore, similar usage restrictions to those applied to vkDestroySemaphore are applied to any command that imports a semaphore payload. Which of these import types is used is referred to as the import operation’s permanence. Each handle type supports either one or both types of permanence.

The implementation must perform the import operation by either referencing or copying the payload referred to by the specified external semaphore handle, depending on the handle’s type. The import method used is referred to as the handle type’s transference. When using handle types with reference transference, importing a payload to a semaphore adds the semaphore to the set of all semaphores sharing that payload. This set includes the semaphore from which the payload was exported. Semaphore signaling and waiting operations performed on any semaphore in the set must behave as if the set were a single semaphore. Importing a payload using handle types with copy transference creates a duplicate copy of the payload at the time of import, but makes no further reference to it. Semaphore signaling and waiting operations performed on the target of copy imports must not affect any other semaphore or payload.

Export operations have the same transference as the specified handle type’s import operations. Additionally, exporting a semaphore payload to a handle with copy transference has the same side effects on the source semaphore’s payload as executing a semaphore wait operation. If the semaphore was using a temporarily imported payload, the semaphore’s prior permanent payload will be restored.

The permanence and transference of handle types can be found in:

External synchronization allows implementations to modify an object’s internal state, i.e. payload, without internal synchronization. However, for semaphores sharing a payload across processes, satisfying the external synchronization requirements of VkSemaphore parameters as if all semaphores in the set were the same object is sometimes infeasible. Satisfying the wait operation state requirements would similarly require impractical coordination or levels of trust between processes. Therefore, these constraints only apply to a specific semaphore handle, not to its payload. For distinct semaphore objects which share a payload, if the semaphores are passed to separate queue submission commands concurrently, behavior will be as if the commands were called in an arbitrary sequential order. If the wait operation state requirements are violated for the shared payload by a queue submission command, or if a signal operation is queued for a shared payload that is already signaled or has a pending signal operation, effects must be limited to one or more of the following:

Returning VK_ERROR_INITIALIZATION_FAILED from the command which resulted in the violation.
Losing the logical device on which the violation occurred immediately or at a future time, resulting in a VK_ERROR_DEVICE_LOST error from subsequent commands, including the one causing the violation.
Continuing execution of the violating command or operation as if the semaphore wait completed successfully after an implementation-dependent timeout. In this case, the state of the payload becomes undefined:, and future operations on semaphores sharing the payload will be subject to these same rules. The semaphore must be destroyed or have its payload replaced by an import operation to again have a well-defined state.

These rules allow processes to synchronize access to shared memory without trusting each other. However, such processes must still be cautious not to use the shared semaphore for more than synchronizing access to the shared memory. For example, a process should not use a shared semaphore as part of an execution dependency chain that, when complete, leads to objects being destroyed, if it does not trust other processes sharing the semaphore payload.

When a semaphore is using an imported payload, its VkExportSemaphoreCreateInfo::handleTypes value is specified when creating the semaphore from which the payload was exported, rather than specified when creating the semaphore. Additionally, VkExternalSemaphoreProperties::exportFromImportedHandleTypes restricts which handle types can be exported from such a semaphore based on the specific handle type used to import the current payload. Passing a semaphore to vkAcquireNextImageKHR is equivalent to temporarily importing a semaphore payload to that semaphore.

Because the exportable handle types of an imported semaphore correspond to its current imported payload, and vkAcquireNextImageKHR behaves the same as a temporary import operation for which the source semaphore is opaque to the application, applications have no way of determining whether any external handle types can be exported from a semaphore in this state. Therefore, applications must not attempt to export external handles from semaphores using a temporarily imported payload from vkAcquireNextImageKHR.

When importing a semaphore payload, it is the responsibility of the application to ensure the external handles meet all valid usage requirements. However, implementations must perform sufficient validation of external handles to ensure that the operation results in a valid semaphore which will not cause program termination, device loss, queue stalls, or corruption of other resources when used as allowed according to its import parameters, and excepting those side effects allowed for violations of the valid semaphore state for wait operations rules. If the external handle provided does not meet these requirements, the implementation must fail the semaphore payload import operation with the error code VK_ERROR_INVALID_EXTERNAL_HANDLE.

In addition, when importing a semaphore payload that is not compatible with the payload type corresponding to the VkSemaphoreType the semaphore was created with, the implementation may fail the semaphore payload import operation with the error code VK_ERROR_INVALID_EXTERNAL_HANDLE.

As the introduction of the external semaphore handle type VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_D3D12_FENCE_BIT predates that of timeline semaphores, support for importing semaphore payloads from external handles of that type into semaphores created (implicitly or explicitly) with a VkSemaphoreType of VK_SEMAPHORE_TYPE_BINARY is preserved for backwards compatibility. However, applications should prefer importing such handle types into semaphores created with a VkSemaphoreType of VK_SEMAPHORE_TYPE_TIMELINE.

Events

The state of an event can also be updated on the device by commands inserted in command buffers.

Pipeline Barriers

Memory Barriers

Memory barriers are used to explicitly control access to buffer and image subresource ranges. Memory barriers are used to transfer ownership between queue families, change image layouts, and define availability and visibility operations. They explicitly define the access types and buffer and image subresource ranges that are included in the access scopes of a memory dependency that is created by a synchronization command that includes them.

Global Memory Barriers

Global memory barriers apply to memory accesses involving all memory objects that exist at the time of its execution.

Buffer Memory Barriers

Buffer memory barriers only apply to memory accesses involving a specific buffer range. That is, a memory dependency formed from a buffer memory barrier is scoped to access via the specified buffer range. Buffer memory barriers can also be used to define a queue family ownership transfer for the specified buffer range.

Image Memory Barriers

Image memory barriers only apply to memory accesses involving a specific image subresource range. That is, a memory dependency formed from an image memory barrier is scoped to access via the specified image subresource range. Image memory barriers can also be used to define image layout transitions or a queue family ownership transfer for the specified image subresource range.

To facilitate usage of images whose memory is initialized on the host, Vulkan allows image layout transitions to be performed by the host as well, albeit supporting limited layouts.

Queue Family Ownership Transfer

Resources created with a VkSharingMode of VK_SHARING_MODE_EXCLUSIVE must have their ownership explicitly transferred from one queue family to another in order to access their content in a well-defined manner on a queue in a different queue family.

Resources shared with external APIs or instances using external memory must also explicitly manage ownership transfers between local and external queues (or equivalent constructs in external APIs) regardless of the VkSharingMode specified when creating them.

If memory dependencies are correctly expressed between uses of such a resource between two queues in different families, but no ownership transfer is defined, the contents of that resource are undefined: for any read accesses performed by the second queue family.

If an application does not need the contents of a resource to remain valid when transferring from one queue family to another, then the ownership transfer should be skipped.

Applications should expect transfers to/from VK_QUEUE_FAMILY_FOREIGN_EXT to be more expensive than transfers to/from VK_QUEUE_FAMILY_EXTERNAL_KHR.

A queue family ownership transfer consists of two distinct parts:

Release exclusive ownership from the source queue family
Acquire exclusive ownership for the destination queue family

An application must ensure that these operations occur in the correct order by defining an execution dependency between them, e.g. using a semaphore.

A release operation is used to release exclusive ownership of a range of a buffer or image subresource range. A release operation is defined by executing a buffer memory barrier (for a buffer range) or an image memory barrier (for an image subresource range) using a pipeline barrier command, on a queue from the source queue family. The srcQueueFamilyIndex parameter of the barrier must be the source queue family index, and the dstQueueFamilyIndex parameter to the destination queue family index. The destination access mask is ignored for such a barrier, such that no visibility operation is executed - the value of this mask does not affect the validity of the barrier. The release operation happens-after the availability operation. If dependencyFlags does not include VK_DEPENDENCY_QUEUE_FAMILY_OWNERSHIP_TRANSFER_USE_ALL_STAGES_BIT_KHR, dstStageMask is also ignored for such a barrier as defined by buffer memory ownership transfer and image memory ownership transfer.

An acquire operation is used to acquire exclusive ownership of a range of a buffer or image subresource range. An acquire operation is defined by executing a buffer memory barrier (for a buffer range) or an image memory barrier (for an image subresource range) using a pipeline barrier command, on a queue from the destination queue family. The buffer range or image subresource range specified in an acquire operation must match exactly that of a previous release operation. The srcQueueFamilyIndex parameter of the barrier must be the source queue family index, and the dstQueueFamilyIndex parameter to the destination queue family index. The source access mask is ignored for such a barrier, such that no availability operation is executed - the value of this mask does not affect the validity of the barrier. The acquire operation happens-before the visibility operation. If dependencyFlags does not include VK_DEPENDENCY_QUEUE_FAMILY_OWNERSHIP_TRANSFER_USE_ALL_STAGES_BIT_KHR, srcStageMask is also ignored for such a barrier as defined by buffer memory ownership transfer and image memory ownership transfer.

Whilst it is not invalid to provide destination or source access masks for memory barriers used for release or acquire operations, respectively, they have no practical effect. Access after a release operation has undefined: results, and so visibility for those accesses has no practical effect. Similarly, write access before an acquire operation will produce undefined: results for future access, so availability of those writes has no practical use. In an earlier version of the specification, these were required to match on both sides - but this was subsequently relaxed. These masks should be set to 0.

To ensure that an acquire and release operation are valid, the release operation must happen-before the acquire operation. Often, semaphores are used for this directly, with the semaphore signaling after a release and then waiting before an acquire. Prior to the introduction of VK_DEPENDENCY_QUEUE_FAMILY_OWNERSHIP_TRANSFER_USE_ALL_STAGES_BIT_KHR, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT is the only valid stage to wait on or wait for these operations, as the acquire and release operations do not occur in a defined stage. When VK_DEPENDENCY_QUEUE_FAMILY_OWNERSHIP_TRANSFER_USE_ALL_STAGES_BIT_KHR is specified however, these can be synchronized with the stages which would otherwise be ignored, as these stages now synchronize the acquire and release operations, providing a way to avoid full pipeline stalls.

If the transfer is via an image memory barrier, and an image layout transition is desired, then the values of oldLayout and newLayout in the release operation's memory barrier must be equal to values of oldLayout and newLayout in the acquire operation's memory barrier. Although the image layout transition is submitted twice, it will only be executed once. A layout transition specified in this way happens-after the release operation and happens-before the acquire operation.

If the values of srcQueueFamilyIndex and dstQueueFamilyIndex are equal, no ownership transfer is performed, and the barrier operates as if they were both set to VK_QUEUE_FAMILY_IGNORED.

Queue family ownership transfers may perform read and write accesses on all memory bound to the image subresource or buffer range, so applications must ensure that all memory writes have been made available before a queue family ownership transfer is executed. Available memory is automatically made visible to queue family release and acquire operations, and writes performed by those operations are automatically made available.

Once a queue family has acquired ownership of a buffer range or image subresource range of a VK_SHARING_MODE_EXCLUSIVE resource, its contents are undefined: to other queue families unless ownership is transferred. The contents of any portion of another resource which aliases memory that is bound to the transferred buffer or image subresource range are undefined: after a release or acquire operation.

Because events cannot be used directly for inter-queue synchronization, and because vkCmdSetEvent does not have the queue family index or memory barrier parameters needed by a release operation, the release and acquire operations of a queue family ownership transfer can only be performed using vkCmdPipelineBarrier.

Wait Idle Operations

Host Write Ordering Guarantees

When batches of command buffers are submitted to a queue via a queue submission command, it defines a memory dependency with prior host operations, and execution of command buffers submitted to the queue.

The first synchronization scope includes execution of vkQueueSubmit on the host and anything that happened-before it, as defined by the host memory model.

Some systems allow writes that do not directly integrate with the host memory model; these have to be synchronized by the application manually. One example of this is non-temporal store instructions on x86; to ensure these happen-before submission, applications should call _mm_sfence().

The second synchronization scope includes all commands submitted in the same queue submission, and all commands that occur later in submission order.

The first access scope includes all host writes to mappable device memory that are available to the host memory domain.

The second access scope includes all memory access performed by the device.

Synchronization and Multiple Physical Devices

If a logical device includes more than one physical device, then fences, semaphores, and events all still have a single instance of the signaled state.

A fence becomes signaled when all physical devices complete the necessary queue operations.

Semaphore wait and signal operations all include a device index that is the sole physical device that performs the operation. These indices are provided in the VkDeviceGroupSubmitInfo and VkDeviceGroupBindSparseInfo structures. Semaphores are not exclusively owned by any physical device. For example, a semaphore can be signaled by one physical device and then waited on by a different physical device.

An event can only be waited on by the same physical device that signaled it (or the host).