Halide supports the Khronos Vulkan framework as a compute API backend for GPU-like
devices, and compiles directly to a binary SPIR-V representation as part of its
code generation before submitting it to the Vulkan API. Both JIT and AOT usage
are supported via the vulkan
target flag (e.g. HL_JIT_TARGET=host-vulkan
).
Vulkan support is actively under development, and considered BETA quality at this stage. Tests are passing, but performance tuning and user testing is needed to identify potential issues before rolling this into production.
See below for details.
You'll need to configure Halide and enable the cmake option TARGET_VULKAN (which is now ON by default).
For example, on Linux & OSX:
% cmake -G Ninja -DTARGET_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm
% cmake --build build --config Release
On Windows, you may need to specify the location of the Vulkan SDK if the paths aren't resolved by CMake automatically. For example (assuming the Vulkan SDK is installed in the default path):
C:\> cmake -G Ninja -DTARGET_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm -DVulkan_LIBRARY=C:\VulkanSDK\1.3.231.1\Lib\vulkan-1.lib -DVulkan_INCLUDE_DIR=C:\VulkanSDK\1.3.231.1\Include\vulkan -S . -B build
C:\> cmake --build build --config Release
Halide has no direct dependency on Vulkan for code-generation, but the runtime requires a working Vulkan environment to run Halide generated code. Any valid Vulkan v1.0+ device driver should work.
Specifically, you'll need:
- A vendor specific Vulkan device driver
- The generic Vulkan loader library
For AMD & NVIDIA & Intel devices, download and install the latest graphics driver for your platform. Vulkan support should be included.
To build Halide AOT generators, you'll need the Vulkan SDK (specifically the Vulkan loader library and headers): https://sdk.lunarg.com/sdk/download/latest/windows/vulkan-sdk.exe
For Vulkan device drivers, consult the appropriate hardware vendor for your device. A few common ones are listed below.
The Vulkan SDK packages are now being maintained by LunarG. These include the Vulkan Loader library, as well as the Vulkan Tools packages. Instructions for installing these can be found on their Getting Started Guide.
Once the SDK has been installed, you need to install the appropriate driver for your device. Proprietary drivers can be installed via 'apt' using PPA's for each vendor. Examples for AMD and NVIDIA are provided below.
For AMD on Ubuntu v22.04:
$ sudo add-apt-repository ppa:oibaf/graphics-drivers
$ sudo apt update
$ sudo apt upgrade
$ sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-tools
For NVIDIA on Ubuntu v22.04:
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt upgrade
# - replace ### with latest driver release (e.g. 515)
$ sudo apt install nvidia-driver-### nvidia-settings libvulkan1 vulkan-tools
Note that only valid drivers for your system should be installed since there are
reports of the Vulkan loader segfaulting just by having a non-supported driver present.
Specifically, the seemingly generic mesa-vulkan-drivers
actually includes the AMD
graphics driver, which can cause problems if installed on an NVIDIA-only system.
You're better off using Halide's Metal backend instead, but it is possible to run Vulkan apps on a Mac via the MoltenVK library:
The easiest way to get the necessary dependencies is to use the official MoltenVK SDK installer provided by LunarG:
Alternatively, if you have the Homebrew package manager installed for MacOS, you can use it to install the Vulkan Loader and MoltenVK compatibility layer:
$ brew install vulkan-loader molten-vk
You can validate that everything is configured correctly by running the vulkaninfo
app (bundled in the vulkan-utils package) to make sure your device is detected (eg):
$ vulkaninfo
==========
VULKANINFO
==========
Vulkan Instance Version: 1.3.224
Instance Extensions: count = 19
===============================
...
Layers: count = 10
==================
VK_LAYER_KHRONOS_profiles (Khronos Profiles layer) Vulkan version 1.3.224, layer version 1:
Layer Extensions: count = 0
Devices: count = 1
GPU id = 0 (NVIDIA GeForce RTX 3070 Ti)
Layer-Device Extensions: count = 1
...
Make sure everything looks correct before continuing!
To generate Halide code for Vulkan, simply add the vulkan
flag to your target as well as any other optional device specific features you wish to enable for Halide:
Target Feature | Description |
---|---|
vulkan |
Enables the vulkan backend |
vk_int8 |
Allows 8-bit integer storage types to be used |
vk_int16 |
Allows 16-bit integer storage types to be used |
vk_int64 |
Allows 64-bit integer storage types to be used |
vk_float16 |
Allows 16-bit floating-point values to be used for computation |
vk_float64 |
Allows 64-bit floating-point values to be used for computation |
vk_v10 |
Generates code compatible with the Vulkan v1.0+ API |
vk_v12 |
Generates code compatible with the Vulkan v1.2+ API |
vk_v13 |
Generates code compatible with the Vulkan v1.3+ API |
Note that 32-bit integer and floating-point types are always available. All other optional device features are off by default (since they are not required by the Vulkan API, and thus must be explicitly enabled to ensure that the code being generated will be compatible with the device and API version being used for execution).
For AOT generators add vulkan
(and any other flags you wish to use) to the target command line option:
$ ./lesson_15_generate -g my_first_generator -o . target=host-vulkan-vk_int8-vk_int16
For JIT apps use the HL_JIT_TARGET
environment variable:
$ HL_JIT_TARGET=host-vulkan-vk_int8-vk_int16 ./tutorial/lesson_01_basics
To modify the default behavior of the runtime, the following environment variables can be used to adjust the configuration of the Vulkan backend at execution time:
HL_VK_LAYERS=...
will tell Halide to choose a suitable Vulkan instance
that supports the given list of layers. If not set, VK_INSTANCE_LAYERS=...
will be used instead. If neither are present, Halide will use the first
Vulkan compute device it can find. Multiple layers can be specified using
the appropriate environment variable list delimiter (:
on Linux/OSX/Posix,
or ;
on Windows).
HL_VK_DEVICE_TYPE=...
will tell Halide to choose which type of device
to select for creating the Vulkan instance. Valid options are 'gpu',
'discrete-gpu', 'integrated-gpu', 'virtual-gpu', or 'cpu'. If not set,
Halide will search for the first 'gpu' like device it can find, or fall back
to the first compute device it can find.
HL_VK_ALLOC_CONFIG=...
will tell Halide to configure the Vulkan memory
allocator use the given constraints specified as 5x integer values
separated by the appropriate environment variable list delimiter
(e.g. N:N:N:N:N
on Linux/OSX/Posix, or N;N;N;N;N
on Windows). These values
correspond to maximum_pool_size
, minimum_block_size
, maximum_block_size
,
maximum_block_count
and nearest_multiple
.
The maximum_pool_size
constraint will tell Halide to configure the
Vulkan memory allocator to never request more than N megabytes for the
entire pool of allocations for the context. This includes all resource
blocks used for suballocations. Setting this to a non-zero value will
limit the amount device memory used by Halide, which may be useful when
other applications and frameworks are competing for resources.
Default is 0 ... meaning no limit.
The minimum_block_size
constraint will tell Halide to configure the
Vulkan memory allocator to always request a minimum of N megabytes for
a resource block, which will be used as a pool for suballocations.
Increasing this value may improve performance while sacrificing the amount
of available device memory. Default is 32MB.
The maximum_block_size
constraint will tell Halide to configure the
Vulkan memory allocator to never exceed a maximum of N megabytes for a
resource block. Decreasing this value may free up more memory but may
impact performance, and/or restrict allocations to be unusably small.
Default is 0 ... meaning no limit.
The maximum_block_count
constraint will tell Halide to configure the
Vulkan memory allocator to never exceed a total of N block allocations.
Decreasing this value may free up more memory but may impact performance,
and/or restrict allocations. Default is 0 ... meaning no limit.
The nearest_multiple
constraint will tell Halide to configure the
Vulkan memory allocator to always round up the requested allocation sizes
to the given integer value. This is useful for architectures that
require specific alignments for subregions allocated within a block.
Default is 32 ... setting this to zero means no constraint.
The following environment variables may be useful for tracking down potential issues related to Vulkan:
HL_DEBUG_CODEGEN=3
will print out debug info that includees the SPIR-V
code generator used for Vulkan while it is compiling.
HL_SPIRV_DUMP_FILE=...
specifies a file to dump the binary SPIR-V generated
during compilation. Useful for debugging CodeGen issues. Can be inspected,
validated and disassembled via the SPIR-V tools:
https://github.com/KhronosGroup/SPIRV-Tools
In addition to the SPIR-V Tools, you may also wish to install the Khronos Validation
Layers which provide an exhaustive suite of runtime checks that can be injected
by adding VK_LAYER_KHRONOS_validation
to the VK_INSTANCE_LAYERS=
environment
variable.
To install the validation layers and the SPIR-V tools on Ubuntu v22.04:
$ sudo apt install vulkan-validationlayers vulkan-validationlayers-dev spirv-tools
To test the validation layer, you can prepend your shell command for any Vulkan enabled binary with the appropriate environment settings. For example, you can run one of the JIT-enabled correctness tests w/debug output and validation layers enabled like so:
$ VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation HL_JIT_TARGET=host-vulkan-vk_int8-vk_int16-vk_int64-vk_float16-vk_float64-vk_v13-debug ./build/test/correctness/correctness_hello_gpu
All correctness tests are now passing on tested configs for Linux & Windows using the target host-vulkan-vk_int8-vk_int16-vk_int64-vk_float16-vk_float64-vk_v13
on LLVM v14.x.
MacOS passes most tests but encounters internal MoltenVK code translation issues for wide vectors, and ambiguous function calls.
Python apps, tutorials and correctness tests are now passing, but the AOT cases are skipped since the runtime environment needs to be customized to locate the platform specific Vulkan loader library.
Android platform support is currently being worked on.
- Other than 32-bit floats and integers, every other data type is optional per the Vulkan spec
- Float 64-bit types can be enabled, but there aren't any native math functions available in SPIR-V
- Only one dynamically sized shared memory allocation can be used, but any number of fixed sized allocation are supported (up to the maximum amount allowed by the device)
- Performance tuning of CodeGen and Runtime
- More platform support (Android is work-in-progress, RISC-V, etc)
- Adapt unsupported types to supported types (if missing vk_int8 then promote to uint32_t)?
- Better debugging utilities using the Vulkan debug hooks.
- Allow debug symbols to be stripped from SPIR-V during codegen to reduce memory overhead for large kernels.
- Investigate floating point rounding and precision (v1.3 adds more controls)
- Investigate memory model usage (can Halide gain anything from these?)