Releases: harrism/hemi
Releases · harrism/hemi
Hemi 2
Hemi 2: Simpler, More Portable CUDA C++
Hemi 2 simplifies writing portable CUDA C/C++ code. With Hemi,
- You can write parallel loops in line in your CPU code, and run them on your GPU;
- You can easily write code that compiles and runs either on the CPU or GPU;
- You can easily launch C++ Lambda functions as GPU kernels;
- Launch configuration details like thread block size and grid size are an optimization detail, rather than a requirement;
With Hemi, parallel code for the GPU can be as simple as the parallel_for loop in the following code, which can also be compiled and run on the CPU.
void saxpy(int n, float a, const float *x, float *y)
{
hemi::parallel_for(0, n, [=] HEMI_LAMBDA (int i) {
y[i] = a * x[i] + y[i];
});
}
New Features
hemi::launch()
for launching portable functions either as parallel kernels on the device, or as serial functions on the host.hemi::cudaLaunch()
for launching CUDA kernels (portable or otherwise).hemi::parallel_for()
for expressing in-line parallel loops that are launched as CUDA kernels (or run on the host).- Support for GPU lambdas with
HEMI_LAMBDA
. GPU Lambdas can be defined in host code and launched on the device usinghemi::launch()
orhemi::parallel_for()
- Automatic parallel execution configuration with
hemi::launch()
,hemi::cudaLaunch()
, andhemi::parallel_for()
. This leaves the specification of the thread block and grid size up to the runtime, so that execution configuration becomes an optimization rather than a requirement. - Grid-stride range-based for loops with the
hemi::grid_stride_range()
helper. - Complete overhaul resulting in greater portability and improved simplicity.
- New and improved samples.
- Tests!
Enjoy Hemi 2. Please report any issues via the Github issue tracker.