Descriptorless Rendering in Vulkan

2021 September 17

Bindless Rendering

Classic Vulkan 1.0 style resource handling involves usage of Descriptors which are grouped together in DescriptorSets. One pipeline has a layout consisting of several sets which can be bound on command buffer recording. Before encoding a draw or dispatch command it's, typically, required to know which descriptors will be accessed in the shader.

Upcoming raytracing usage requires a more flexible model. Geometry instances with different material maybe be hit by rays withing the same shader dispatch. To elevate this restrictions 'bindless' rendering has been around for a few years already. A detailed breakdown can be found in RayTracing Gems II Chapter: 'Using bindless resources with DirectX raytracing' by Matt Pettineo. Compared to DirectX, in Vulkan this is a bit more cumbersome to as it requires multiple unsized descriptor sets to be bound per descriptor type.

GLSL also lacks the SM6.6 dynamic resource access feature. In theory, this can be implemented as frontend feature by inferring the descriptor type and specializing the implementation to substitute the correct descriptor table - requires adhoc knowledge of the binding layout tho.

Descriptorless Rendering

Going one step further, why do we require the descriptor tables as at all? In the bindless setup, we pass indices into the descriptor tables stored in buffers to the shader. The tables are filled on the host side with the descriptor handles, which needs some additional bookkeeping and descriptor slot tracking.. so can we actually remove the descriptor table inbetween and just pass in the descriptors handles?

Yes! .. but only on NVIDIA GPUs right now with experimental extensions. As it haven't seen it elsewhere I will just call it 'descriptorless' compared to 'bindless' - even though the later one fits better. Let's go trough the different descriptor types and see how this can be implemented.

Storage buffers: Physical storage buffers provide us a OpConvertUToPtr SPIR-V operation on the device side and we can query the buffer device address on the host side. Additionally, this allows to cast to any pointer type along with some ptr arithmetic! In case of bindless without physical storage buffer, we are more or less required to use untyped buffers. DirectXShaderCompiler added templated load/store operations to handle type conversion. SPIR-V doesn't support wide load operations, so we are up to the compiler optimizer to merge the OpLoad instructions together internally. Physical storage buffers provide more information regarding the structure and overall alignment (needs to be respected!).
Acceleration Structure: Thankfully, the raytracing extension comes with the corresponding OpConvertUToAccelerationStructure operation along host side support. Pretty nice!

So far, so good! Now comes the tricky part with images/samplers: On the device side we only have SPV_NV_bindless_texturewhich is vendor specific, which provides the desired OpConvertUToX instructions as for the other descriptor types. Looking at the host-side in Vulkan, the handles can be queried via the VK_NVX_image_view_handleextension, but only for sampled images, storage images and combined image samplers. Personally, I merged all samplers therefore with the image view - seems good enough! The biggest issue here is the lack of support in the ecosystem right now beside the experimental nature maybe - the extension basically exists on paper and is implemented within driver and NSight Graphics (!). It hasn't been integrated into SPIRV-Headers which would be a key point to spread over the rest of the libraries and toolsets: SPIRV-Tools (opt & val), Validation Layers, glslang, etc.

For my personal renderer I hacked the pieces together in a fork of rust-gpu in a similar fashion as the dynamic resource access using an intrinsic to emit the requested OpConvertUToX operation from u64:

fn resource_from_handle<T>(resource: u64) -> T { .. }

On the host side, things simplify a lot - DescriptorSets and -Tables can be removed completely. Pipeline layouts reduce to push constant ranges. There is no further housekeeping needed for descriptors at all, which feels like a big ergonomic win.

rust-gpu shader dummy example:

#[repr(C)]
#[derive(Copy, Clone)]
pub struct MeshConstants {
    material: Buffer,
}

#[spirv(fragment)]
pub fn mesh_fs(
    a_texcoord: f32x2,
    #[spirv(flat)] a_material_id: u32,
    output: &mut f32x4,
    #[spirv(push_constant)] constants: &MeshConstants,
) {
    // Buffer(u64) -> RuntimeArray<MaterialData> -> index @ material_id
    let material_data: &mut MaterialData =
        unsafe { constants.material.index_mut(a_material_id as _) };

    let mut diffuse = material_data.albedo_color;
    
    // '0' used as invalid handle for now
    if material_data.albedo_map != 0 {
        // u64 -> combined image sampler
        let albedo_tex: f32x4 = unsafe {
            resource_from_handle::<SampledImage<Image2d>>(material_data.albedo_map)
                .sample(a_texcoord)
        };
        diffuse *= albedo_tex;
    }
    *output = diffuse;
}

Conclusion

Does it work? Yes! - some parts like non-uniform access in this setup are not clear to me
Is it worth/practical right now? Probably not, only support on a single vendor, barely any ecosystem support and might break/be removed down the line
Useful in the future? I hope so, as it strips down the feature set - one can also emulate the bindless approach using indices and a custom tables of u64 handles. Having a multi vendor extension would be great to further increase support and stability.

PreviousPath Rendering Anti-Aliasing

Last updated 3 years ago