Releases
rocm-4.2.0
rocBLAS-2.38.0 for ROCm 4.2.0
Latest
Added
Added option to install script to build only rocBLAS clients with a pre-built rocBLAS library
Supported gemm ext for unpacked int8 input layout on gfx908 GPUs
Added new flags rocblas_gemm_flags::rocblas_gemm_flags_pack_int8x4 to specify if using the packed layout
Set the rocblas_gemm_flags_pack_int8x4 when using packed int8x4, this should be always set on GPUs before gfx908.
For gfx908 GPUs, unpacked int8 is supported so no need to set this flag.
Notice the default flags 0 uses unpacked int8, this somehow changes the behaviour of int8 gemm from ROCm 4.1.0
Added a query function rocblas_query_int8_layout_flag to get the preferable layout of int8 for gemm by device
Optimizations
Improved performance of single precision copy, swap, and scal when incx == 1 and incy == 1.
Improved performance of single precision axpy when incx == 1, incy == 1 and batch_count =< 8192.
Improved performance of trmm.
Changed
Change cmake_minimum_required to VERSION 3.16.8
You can’t perform that action at this time.