Queries

Queries provide a mechanism to return information about the processing of a sequence of Vulkan commands. Query operations are asynchronous, and as such, their results are not returned immediately. Instead, their results, and their availability status are stored in a Query Pool. The state of these queries can be read back on the host, or copied to a buffer object on the device.

The supported query types are Occlusion Queries, Pipeline Statistics Queries, Result Status Queries, Video Encode Feedback Queries and Timestamp Queries. Performance Queries are supported if the associated extension is available. Transform Feedback Queries are supported if the associated extension is available. Intel Performance Queries are supported if the associated extension is available. Mesh Shader Queries are supported if the associated extension is available.

Several additional queries with specific purposes associated with ray tracing are available if the corresponding extensions are supported, as described for VkQueryType.

Query Pools

VkQueryPoolOpaque handle to a query pool object
vkCreateQueryPoolCreate a new query pool object
VkQueryPoolCreateInfoStructure specifying parameters of a newly created query pool
VkQueryPoolCreateFlagsReserved for future use
VkQueryPoolPerformanceCreateInfoKHRStructure specifying parameters of a newly created performance query pool
vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHRReports the number of passes require for a performance query pool type
vkDestroyQueryPoolDestroy a query pool object
VkQueryTypeSpecify the type of queries managed by a query pool

Query Operation

The operation of queries is controlled by the commands vkCmdBeginQuery, vkCmdEndQuery, vkCmdBeginQueryIndexedEXT, vkCmdEndQueryIndexedEXT, vkCmdResetQueryPool, vkCmdCopyQueryPoolResults, vkCmdWriteTimestamp2, and vkCmdWriteTimestamp.

In order for a VkCommandBuffer to record query management commands, the queue family for which its VkCommandPool was created must support the appropriate type of operations (graphics, compute) suitable for the query type of a given query pool.

Each query in a query pool has a status that is either unavailable or available, and also has state to store the numerical results of a query operation of the type requested when the query pool was created. Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the status to unavailable and makes the numerical results undefined:. A query is made available by the operation of vkCmdEndQuery, vkCmdEndQueryIndexedEXT, vkCmdWriteTimestamp2, or vkCmdWriteTimestamp. Both the availability status and numerical results can be retrieved by calling either vkGetQueryPoolResults or vkCmdCopyQueryPoolResults.

After query pool creation, each query is in an uninitialized state and must be reset before it is used. Queries must also be reset between uses.

If a logical device includes multiple physical devices, then each command that writes a query must execute on a single physical device, and any call to vkCmdBeginQuery must execute the corresponding vkCmdEndQuery command on the same physical device.

vkCmdResetQueryPoolReset queries in a query pool
vkResetQueryPoolReset queries in a query pool

Once queries are reset and ready for use, query commands can be issued to a command buffer. Occlusion queries and pipeline statistics queries count events - drawn samples and pipeline stage invocations, respectively - resulting from commands that are recorded between a vkCmdBeginQuery command and a vkCmdEndQuery command within a specified command buffer, effectively scoping a set of drawing and/or dispatching commands. Timestamp queries write timestamps to a query pool. Performance queries record performance counters to a query pool.

A query must begin and end in the same command buffer, although if it is a primary command buffer, and the inheritedQueries feature is enabled, it can execute secondary command buffers during the query operation. For a secondary command buffer to be executed while a query is active, it must set the occlusionQueryEnable, queryFlags, and/or pipelineStatistics members of VkCommandBufferInheritanceInfo to conservative values, as described in the Command Buffer Recording section. A query must either begin and end inside the same subpass of a render pass instance, or must both begin and end outside of a render pass instance (i.e. contain entire render pass instances).

If queries are used while executing a render pass instance that has multiview enabled, the query uses N consecutive query indices in the query pool (starting at query) where N is the number of bits set in the view mask in the subpass the query is used in. How the numerical results of the query are distributed among the queries is implementation-dependent. For example, some implementations may write each view’s results to a distinct query, while other implementations may write the total result to the first query and write zero to the other queries. However, the sum of the results in all the queries must accurately reflect the total result of the query summed over all views. Applications can sum the results from all the queries to compute the total result.

Queries used with multiview rendering must not span subpasses, i.e. they must begin and end in the same subpass.

A query must either begin and end inside the same video coding scope, or must both begin and end outside of a video coding scope and must not contain entire video coding scopes.

vkCmdBeginQueryBegin a query
vkCmdBeginQueryIndexedEXTBegin an indexed query
VkQueryControlFlagBitsBitmask specifying constraints on a query
VkQueryControlFlagsBitmask of VkQueryControlFlagBits
vkCmdEndQueryEnds a query
vkCmdEndQueryIndexedEXTEnds a query

An application can retrieve results either by requesting they be written into application-provided memory, or by requesting they be copied into a VkBuffer. In either case, the layout in memory is defined as follows:

  • The first query’s result is written starting at the first byte requested by the command, and each subsequent query’s result begins stride bytes later.
  • Occlusion queries, pipeline statistics queries, transform feedback queries, primitives generated queries, mesh shader queries, video encode feedback queries, and timestamp queries store results in a tightly packed array of unsigned integers, either 32- or 64-bits as requested by the command, storing the numerical results and, if requested, the availability status.
  • Performance queries store results in a tightly packed array whose type is determined by the unit member of the corresponding VkPerformanceCounterKHR.
  • If VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is used, the final element of each query’s result is an integer indicating whether the query’s result is available, with any non-zero value indicating that it is available.
  • If VK_QUERY_RESULT_WITH_STATUS_BIT_KHR is used, the final element of each query’s result is an integer value indicating that status of the query result. Positive values indicate success, negative values indicate failure, and 0 indicates that the result is not yet available. Specific error codes are encoded in the VkQueryResultStatusKHR enumeration.
  • Occlusion queries write one integer value - the number of samples passed. Pipeline statistics queries write one integer value for each bit that is enabled in the pipelineStatistics when the pool is created, and the statistics values are written in bit order starting from the least significant bit. Timestamp queries write one integer value. Performance queries write one VkPerformanceCounterResultKHR value for each VkPerformanceCounterKHR in the query. Transform feedback queries write two integers; the first integer is the number of primitives successfully written to the corresponding transform feedback buffer and the second is the number of primitives output to the vertex stream, regardless of whether they were successfully captured or not. In other words, if the transform feedback buffer was sized too small for the number of primitives output by the vertex stream, the first integer represents the number of primitives actually written and the second is the number that would have been written if all the transform feedback buffers associated with that vertex stream were large enough. Primitives generated queries write the number of primitives output to the vertex stream, regardless of whether transform feedback is active or not, or whether they were successfully captured by transform feedback or not. This is identical to the second integer of the transform feedback queries if transform feedback is active. Mesh shader queries write a single integer. Video encode feedback queries write one or more integer values for each bit that is enabled in VkQueryPoolVideoEncodeFeedbackCreateInfoKHR::encodeFeedbackFlags when the pool is created, and the feedback values are written in bit order starting from the least significant bit, as described here.
  • If more than one query is retrieved and stride is not at least as large as the size of the array of values corresponding to a single query, the values written to memory are undefined:.
vkGetQueryPoolResultsCopy results of queries in a query pool to a host memory region
VkQueryResultFlagBitsBitmask specifying how and when query results are returned
VkQueryResultFlagsBitmask of VkQueryResultFlagBits
VkQueryResultStatusKHRSpecific status codes for operations
vkCmdCopyQueryPoolResultsCopy the results of queries in a query pool to a buffer object

Rendering operations such as clears, MSAA resolves, attachment load/store operations, and blits may count towards the results of queries. This behavior is implementation-dependent and may vary depending on the path used within an implementation. For example, some implementations have several types of clears, some of which may include vertices and some not.

Occlusion Queries

Occlusion queries track the number of samples that pass the per-fragment tests for a set of drawing commands. As such, occlusion queries are only available on queue families supporting graphics operations. The application can then use these results to inform future rendering decisions. An occlusion query is begun and ended by calling vkCmdBeginQuery and vkCmdEndQuery, respectively. When an occlusion query begins, the count of passing samples always starts at zero. For each drawing command, the count is incremented as described in Sample Counting. If flags does not contain VK_QUERY_CONTROL_PRECISE_BIT an implementation may generate any non-zero result value for the query if the count of passing samples is non-zero.

Not setting VK_QUERY_CONTROL_PRECISE_BIT mode may be more efficient on some implementations, and should be used where it is sufficient to know a boolean result on whether any samples passed the per-fragment tests. In this case, some implementations may only return zero or one, indifferent to the actual number of samples passing the per-fragment tests.

Setting VK_QUERY_CONTROL_PRECISE_BIT does not guarantee that different implementations return the same number of samples in an occlusion query. Some implementations may kill fragments in the pre-rasterization shader stage, and these killed fragments do not contribute to the final result of the query. It is possible that some implementations generate a zero result value for the query, while others generate a non-zero value.

When an occlusion query finishes, the result for that query is marked as available. The application can then either copy the result to a buffer (via vkCmdCopyQueryPoolResults) or request it be put into host memory (via vkGetQueryPoolResults).

If occluding geometry is not drawn first, samples can pass the depth test, but still not be visible in a final image.

Pipeline Statistics Queries

Pipeline statistics queries allow the application to sample a specified set of VkPipeline counters. These counters are accumulated by Vulkan for a set of either drawing or dispatching commands while a pipeline statistics query is active. As such, pipeline statistics queries are available on queue families supporting either graphics or compute operations. The availability of pipeline statistics queries is indicated by the pipelineStatisticsQuery member of the VkPhysicalDeviceFeatures object (see vkGetPhysicalDeviceFeatures and vkCreateDevice for detecting and requesting this query type on a VkDevice).

A pipeline statistics query is begun and ended by calling vkCmdBeginQuery and vkCmdEndQuery, respectively. When a pipeline statistics query begins, all statistics counters are set to zero. While the query is active, the pipeline type determines which set of statistics are available, but these must be configured on the query pool when it is created. If a statistic counter is issued on a command buffer that does not support the corresponding operation, or the counter corresponds to a shading stage which is missing from any of the pipelines used while the query is active, the value of that counter is undefined: after the query has been made available. At least one statistic counter relevant to the operations supported on the recording command buffer must be enabled.

VkQueryPipelineStatisticFlagBitsBitmask specifying queried pipeline statistics
VkQueryPipelineStatisticFlagsBitmask of VkQueryPipelineStatisticFlagBits

Timestamp Queries

Timestamps provide applications with a mechanism for timing the execution of commands. A timestamp is an integer value generated by the VkPhysicalDevice. Unlike other queries, timestamps do not operate over a range, and so do not use vkCmdBeginQuery or vkCmdEndQuery. The mechanism is built around a set of commands that allow the application to tell the VkPhysicalDevice to write timestamp values to a query pool and then either read timestamp values on the host (using vkGetQueryPoolResults) or copy timestamp values to a VkBuffer (using vkCmdCopyQueryPoolResults). The application can then compute differences between timestamps to determine execution time.

The number of valid bits in a timestamp value is determined by the VkQueueFamilyProperties::timestampValidBits property of the queue on which the timestamp is written. Timestamps are supported on any queue which reports a non-zero value for timestampValidBits via vkGetPhysicalDeviceQueueFamilyProperties. If the timestampComputeAndGraphics limit is VK_TRUE, timestamps are supported by every queue family that supports either graphics or compute operations (see VkQueueFamilyProperties).

The number of nanoseconds it takes for a timestamp value to be incremented by 1 can be obtained from VkPhysicalDeviceLimits::timestampPeriod after a call to vkGetPhysicalDeviceProperties.

vkCmdWriteTimestamp2Write a device timestamp into a query object
vkCmdWriteTimestampWrite a device timestamp into a query object

Performance Queries

Performance queries provide applications with a mechanism for getting performance counter information about the execution of command buffers, render passes, and commands.

Each queue family advertises the performance counters that can be queried on a queue of that family via a call to vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR. Implementations may limit access to performance counters based on platform requirements or only to specialized drivers for development purposes.

This may include no performance counters being enumerated, or a reduced set. Please refer to platform-specific documentation for guidance on any such restrictions.

Performance queries use the existing vkCmdBeginQuery and vkCmdEndQuery to control what command buffers, render passes, or commands to get performance information for.

Implementations may require multiple passes where the command buffer, render passes, or commands being recorded are the same and are executed on the same queue to record performance counter data. This is achieved by submitting the same batch and providing a VkPerformanceQuerySubmitInfoKHR structure containing a counter pass index. The number of passes required for a given performance query pool can be queried via a call to vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR.

Command buffers created with VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT must not be re-submitted. Changing command buffer usage bits may affect performance. To avoid this, the application should re-record any command buffers with the VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT when multiple counter passes are required.

Performance counter results from a performance query pool can be obtained with the command vkGetQueryPoolResults.

VkPerformanceCounterResultKHRUnion containing a performance counter result

Profiling Lock

vkAcquireProfilingLockKHRAcquires the profiling lock
VkAcquireProfilingLockInfoKHRStructure specifying parameters to acquire the profiling lock
VkAcquireProfilingLockFlagBitsKHRReserved for future use
VkAcquireProfilingLockFlagsKHRReserved for future use
vkReleaseProfilingLockKHRReleases the profiling lock

Transform Feedback Queries

Transform feedback queries track the number of primitives attempted to be written and actually written, by the vertex stream being captured, to a transform feedback buffer. This query is updated during drawing commands while transform feedback is active. The number of primitives actually written will be less than the number attempted to be written if the bound transform feedback buffer size was too small for the number of primitives actually drawn. Primitives are not written beyond the bound range of the transform feedback buffer. A transform feedback query is begun and ended by calling vkCmdBeginQuery and vkCmdEndQuery, respectively to query for vertex stream zero. vkCmdBeginQueryIndexedEXT and vkCmdEndQueryIndexedEXT can be used to begin and end transform feedback queries for any supported vertex stream. When a transform feedback query begins, the count of primitives written and primitives needed starts from zero. For each drawing command, the count is incremented as vertex attribute outputs are captured to the transform feedback buffers while transform feedback is active.

When a transform feedback query finishes, the result for that query is marked as available. The application can then either copy the result to a buffer (via vkCmdCopyQueryPoolResults) or request it be put into host memory (via vkGetQueryPoolResults).

Primitives Generated Queries

When a generated primitive query for a vertex stream is active, the primitives-generated count is incremented every time a primitive emitted to that stream reaches the transform feedback stage, whether or not transform feedback is active. A primitives generated query is begun and ended by calling vkCmdBeginQuery and vkCmdEndQuery, respectively to query for vertex stream zero. vkCmdBeginQueryIndexedEXT and vkCmdEndQueryIndexedEXT can be used to begin and end primitives generated queries for any supported vertex stream. When a primitives generated query begins, the count of primitives generated starts from zero.

When a primitives generated query finishes, the result for that query is marked as available. The application can then either copy the result to a buffer (via vkCmdCopyQueryPoolResults) or request it be put into host memory (via vkGetQueryPoolResults).

The result of this query is typically identical to VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT, but the primitives generated query is deterministic, i.e. it must be identical to the number of primitives processed. VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT may vary for implementation-dependent reasons, e.g. the same primitive may be processed multiple times for purposes of clipping.

Mesh Shader Queries

When a generated mesh primitives query is active, the mesh-primitives-generated count is incremented every time a primitive emitted from the mesh shader stage reaches the fragment shader stage. When a generated mesh primitives query begins, the mesh-primitives-generated count starts from zero.

Mesh and task shader pipeline statistics queries function the same way that invocation queries work for other shader stages, counting the number of times the respective shader stage has been run. When the statistics query begins, the invocation counters start from zero.

Intel Performance Queries

Intel performance queries allow an application to capture performance data for a set of commands. Performance queries are used in a similar way than other types of queries. A main difference with existing queries is that the resulting data should be handed over to a library capable to produce human readable results rather than being read directly by an application.

vkInitializePerformanceApiINTELInitialize a device for performance queries
VkInitializePerformanceApiInfoINTELStructure specifying parameters of initialize of the device
vkUninitializePerformanceApiINTELUninitialize a device for performance queries
vkGetPerformanceParameterINTELQuery performance capabilities of the device
VkPerformanceParameterTypeINTELParameters that can be queried
VkPerformanceValueINTELContainer for value and types of parameters that can be queried
VkPerformanceValueTypeINTELType of the parameters that can be queried
VkPerformanceValueDataINTELValues returned for the parameters
VkQueryPoolPerformanceQueryCreateInfoINTELStructure specifying parameters to create a pool of performance queries
VkQueryPoolSamplingModeINTELEnum specifying how performance queries should be captured
vkCmdSetPerformanceMarkerINTELMarkers
VkPerformanceMarkerInfoINTELStructure specifying performance markers
vkCmdSetPerformanceStreamMarkerINTELMarkers
VkPerformanceStreamMarkerInfoINTELStructure specifying stream performance markers
vkCmdSetPerformanceOverrideINTELPerformance override settings
VkPerformanceOverrideInfoINTELPerformance override information
VkPerformanceOverrideTypeINTELPerformance override type
VkPerformanceConfigurationINTELDevice configuration for performance queries
vkAcquirePerformanceConfigurationINTELAcquire the performance query capability
VkPerformanceConfigurationAcquireInfoINTELAcquire a configuration to capture performance data
VkPerformanceConfigurationTypeINTELType of performance configuration
vkQueueSetPerformanceConfigurationINTELSet a performance query
vkReleasePerformanceConfigurationINTELRelease a configuration to capture performance data

Result Status Queries

Result status queries serve a single purpose: allowing the application to determine whether a set of operations have completed successfully or not, as indicated by the VkQueryResultStatusKHR value written when retrieving the result of a query using the VK_QUERY_RESULT_WITH_STATUS_BIT_KHR flag.

Unlike other query types, result status queries do not track or maintain any other data beyond the completion status, thus no other data is written when retrieving their results.

Support for result status queries is indicated by VkQueueFamilyQueryResultStatusPropertiesKHR::queryResultStatusSupport , as returned by vkGetPhysicalDeviceQueueFamilyProperties2 for the queue family in question.

Video Encode Feedback Queries

Video encode feedback queries allow the application to capture feedback values generated by video encode operations. As such, video encode feedback queries are available on queue families supporting video encode operations. The availability of individual video encode feedback values is indicated by the bits of VkVideoEncodeCapabilitiesKHR::supportedEncodeFeedbackFlags, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile the queries are intended to be used with.

The set of enabled video encode feedback values must be configured on the query pool when it is created using the encodeFeedbackFlags member of the VkQueryPoolVideoEncodeFeedbackCreateInfoKHR included in the pNext chain of VkQueryPoolCreateInfo.

VkQueryPoolVideoEncodeFeedbackCreateInfoKHRStructure specifying enabled video encode feedback values
VkVideoEncodeFeedbackFlagBitsKHRBits specifying queried video encode feedback values
VkVideoEncodeFeedbackFlagsKHRBitmask of VkVideoEncodeFeedbackFlagBitsKHR