Releases: icl-utk-edu/blaspp
Releases · icl-utk-edu/blaspp
2024.10.26
- Added PAPI SDE that counts flops.
- Use
to_blas_int
to convert int32 to int64.
2024.05.31
- Added shared library version (ABI version 1.0.0)
- Updated enum parameters to have
to_string
,from_string
;
deprecate<enum>2str
,str2<enum>
- Removed some deprecated functions. Deprecated MemcpyKind
- Added PAPI SDE counters (manually enable for now)
- Support ROCm 5.6.0 (rocBLAS 3.0), trmm with 3 matrices (A, B, C)
- Fixed bug in her2k with complex alpha and row-major matrix
- Fixed bug in example_gemm
- Use
sqrt
, etc. withoutstd::
to enable argument dependent lookup (ADL)
2023.11.05
- Fix Queue workspace
- Update Fortran strlen handling
- Fix CMake unity build
- Fix CMake library ordering
2023.08.25
- Use yyyy.mm.dd version scheme, instead of yyyy.mm.release
- Added oneAPI support to CMake
- Fixed int64 support
- More robust Makefile configure doesn't require CUDA or ROCm to be in
compiler search paths (CPATH, LIBRARY_PATH, etc.)
2023.06.00
- Revised Queue class to allow creating Queue from an existing
CUDA/HIP stream, cuBLAS/rocBLAS handle, or SYCL queue. Also
allocates streams and workspace on demand, to make Queue creation
much lighter weight. - Improved oneAPI support.
2023.01.00
- Added oneAPI port (currently Makefile only)
- Added queue argument to
device_malloc, device_free
, etc.;
deprecated old versions - Deprecated
set_device, get_device
- Renamed
device_malloc_pinned
tohost_malloc_pinned
- Added
device_copy_{matrix,vector}
;
deprecateddevice_{set,get}{matrix,vector}
- Added queue argument to
- Added more Level 1 BLAS on GPU device: axpy, dot, nrm2
- Moved main repo to https://github.com/icl-utk-edu/blaspp/
- Refactored routines for better maintainability
- Use python3
2022.07.00
- Added workspace in queue; used in LAPACK++
- Set device in memcpy, etc.
- Updated Schur gemm test with tile layout
2022.05.00
- Added Level 3 BLAS template implementations
- Added device copy, scal
- Added Schur gemm test, batched tile and LAPACK formats
- Fixed gbmm flops when rectangular
- Fixed CMake when BLAS_LIBRARIES is empty
2021.04.01
- Fixed bug in
test_trsm_device
for row-major
2021.04.00
- Added HIP/ROCm support
- Added include/blas/defines.h based on configuration
- Various bug and CMake fixes