Releases · ROCm/rocFFT

15 Dec 18:30

rocm-ci

rocm-6.0.0

b9926b5

rocFFT 1.0.25 for ROCm 6.0.0

Added

Implemented experimental APIs to allow computing FFTs on data distributed across multiple devices in a single process.

rocfft_field is a new type that can be added to a plan description, to describe layout of FFT input or output. rocfft_field_add_brick can be called one or more times to describe a brick decomposition of an FFT field, where each brick can be assigned a different device.

These interfaces are still experimental and subject to change. We are interested to hear feedback on them. Questions and concerns may be raised by opening issues on the rocFFT issue tracker.

Note that at this time, multi-device FFTs have several limitations:
- Real-complex (forward or inverse) FFTs are not currently supported.
- Planar format fields are not currently supported.
- Batch (i.e. number_of_transforms provided to rocfft_plan_create) must be 1.
- The FFT input is gathered to the current device at execute time, so all of the FFT data must fit on that device.
We expect these limitations to be removed in future releases.

Optimizations

Improved performance of some small 2D/3D real FFTs supported by 2D_SINGLE kernel. gfx90a gets more optimization
by offline tuning.
Removed an extra kernel launch from even-length real-complex FFTs that use callbacks.

Changed

Built kernels in solution-map to library kernel cache.
Real forward transforms (real-to-complex) no longer overwrite input. rocFFT still may overwrite real inverse (complex-to-real) input, as this allows for faster performance.
rocfft-rider and dyna-rocfft-rider have been renamed to rocfft-bench and dyna-rocfft-bench, controlled by the
BUILD_CLIENTS_BENCH CMake option. Links for the old file names are installed, and the old
BUILD_CLIENTS_RIDER CMake option is accepted for compatibility but both will be removed in a future release.
Binaries in debug builds no longer have a "-d" suffix.

Fixed

rocFFT now correctly handles load callbacks that convert data from a smaller data type (e.g. 16-bit integers -> 32-bit float).

Assets 2

13 Oct 18:57

rocm-ci

rocm-5.7.1

7520fc6

rocFFT 1.0.24 for ROCm 5.7.1

rocFFT code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.

Assets 2

15 Sep 17:29

rocm-ci

rocm-5.7.0

7520fc6

rocFFT 1.0.24 for ROCm 5.7.0

Optimizations

Improved performance of complex forward/inverse 1D FFTs (2049 <= length <= 131071) that use Bluestein's algorithm.

Added

Implemented a solution map version converter and finish the first conversion from ver.0 to ver.1. Where version 1 removes some incorrect kernels (sbrc/sbcr using half_lds)

Changed

Moved rocfft_rtc_helper executable to lib/rocFFT directory on Linux.
Moved library kernel cache to lib/rocFFT directory.

Assets 2

29 Aug 20:12

rocm-ci

rocm-5.6.1

9bd44ae

rocFFT 1.0.23 for ROCm 5.6.1

rocFFT code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

Assets 2

28 Jun 23:17

rocm-ci

rocm-5.6.0

946a75d

rocFFT 1.0.23 for ROCm 5.6.0

Added

Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.

Changed

Replaced std::complex with hipComplex data types for data generator.
FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.

Fixed

Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.

Assets 2

24 May 19:07

rocm-ci

rocm-5.5.1

e7d6273

rocFFT 1.0.22 for ROCm 5.5.1

rocFFT code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

Assets 2

01 May 21:04

rocm-ci

rocm-5.5.0

e7d6273

rocFFT 1.0.22 for ROCm 5.5.0

Optimizations

Improved performance of 1D lengths < 2048 that use Bluestein's algorithm.
Reduced time for generating code during plan creation.
Optimized 3D R2C/C2R lengths 32, 84, 128.
Optimized batched small 1D R2C/C2R cases.

Added

Added gfx1101 to default AMDGPU_TARGETS.

Changed

Moved client programs to C++17.
Moved planar kernels and infrequently used Stockham kernels to be runtime-compiled.
Moved transpose, real-complex, Bluestein, and Stockham kernels to library kernel cache.

Fixed

Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.

Assets 2

22 Mar 20:47

rocm-ci

rocm-5.4.4

5687cd9

rocFFT 1.0.21 for ROCm 5.4.4

rocFFT code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.

Assets 2

07 Feb 17:34

rocm-ci

rocm-5.4.3

5687cd9

rocFFT 1.0.21 for ROCm 5.4.3

Fixed

Removed source directory from rocm_install_targets call to prevent installation of rocfft.h in an unintended location.

Assets 2

13 Jan 16:43

rocm-ci

rocm-5.4.2

9961827

rocFFT 1.0.20 for ROCm 5.4.2

rocFFT code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Optimizations

Changed

Fixed

Optimizations

Added

Changed

Added

Changed

Fixed

Optimizations

Added

Changed

Fixed

Fixed

Releases: ROCm/rocFFT

rocFFT 1.0.25 for ROCm 6.0.0

Added

Optimizations

Changed

Fixed

rocFFT 1.0.24 for ROCm 5.7.1

rocFFT 1.0.24 for ROCm 5.7.0

Optimizations

Added

Changed

rocFFT 1.0.23 for ROCm 5.6.1

rocFFT 1.0.23 for ROCm 5.6.0

Added

Changed

Fixed

rocFFT 1.0.22 for ROCm 5.5.1

rocFFT 1.0.22 for ROCm 5.5.0

Optimizations

Added

Changed

Fixed

rocFFT 1.0.21 for ROCm 5.4.4

rocFFT 1.0.21 for ROCm 5.4.3

Fixed

rocFFT 1.0.20 for ROCm 5.4.2