Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROCm support (AMDGPU) #572

Merged
merged 29 commits into from
Jun 3, 2022
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7a577a7
Add ROCm (AMDGPU) support
luraess Apr 18, 2022
3dd77fa
Fix tests
luraess Apr 18, 2022
cc46cde
Fix tests
luraess Apr 18, 2022
b9d5811
Fix tests
luraess Apr 18, 2022
24c4f48
Update doc
luraess Apr 18, 2022
4782cdf
Add doc update
luraess Apr 18, 2022
9545aa8
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 18, 2022
ef153a2
Update doc with link to rocm scripts
luraess Apr 19, 2022
971a78f
Add cleaner condition
luraess Apr 19, 2022
980ed51
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 20, 2022
0fb7f7b
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 26, 2022
5e66d2c
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess May 2, 2022
1ebb7dc
Add ROCm tests
luraess May 2, 2022
77e9f2c
Update pipeline.yml
luraess May 2, 2022
dc76404
Update buildkite ROCm MPI launch params
luraess May 2, 2022
79006dd
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess May 12, 2022
bb53453
Uncomment failing tests
luraess May 13, 2022
bd7d403
Update CI MPI wrapper
luraess May 17, 2022
2b85ac5
Merge branch 'master' of github.com:JuliaParallel/MPI.jl into JuliaPa…
luraess May 31, 2022
1bdcee2
Merge branch 'JuliaParallel-master' into lr/rocmaware-dev
luraess May 31, 2022
10b454c
Add AMDGPU support to test.
luraess May 31, 2022
83cfe02
add buildkite script
simonbyrne Jun 1, 2022
fa73ba2
use latest Open MPI
simonbyrne Jun 1, 2022
3426a3a
disable AMDGPU julia 1.6
simonbyrne Jun 1, 2022
27bb633
try UCX 1.13-rc1
simonbyrne Jun 2, 2022
83fe889
Add synchronize
simonbyrne Jun 2, 2022
28edee4
Update test/common.jl
simonbyrne Jun 2, 2022
5fd4180
add more synchronize()
simonbyrne Jun 2, 2022
727c8ea
modify conversion to MPIPtr
simonbyrne Jun 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 91 additions & 3 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
queue: "juliagpu"
cuda: "11.0"
env:
OPENMPI_VER: "4.0"
OPENMPI_VER_FULL: "4.0.3"
OPENMPI_VER: "4.1"
OPENMPI_VER_FULL: "4.1.4"
UCX_VER: "1.12.1"
CCACHE_DIR: "/root/ccache"
commands: |
Expand Down Expand Up @@ -43,7 +43,7 @@
- "mpi-prefix.tar.gz"

- wait

- label: "Tests -- Julia 1.6"
plugins:
- JuliaCI/julia#v1:
Expand Down Expand Up @@ -135,3 +135,91 @@
import Pkg
Pkg.test("MPI")
'

- group: "ROCm"
key: "rocm"
steps:
- label: "Build OpenMPI"
key: "rocm-build-openmpi"
agents:
queue: "juliagpu"
rocm: "*" # todo fix ROCM version
env:
OPENMPI_VER: "4.1"
OPENMPI_VER_FULL: "4.1.4"
UCX_VER: "1.13-rc1"
CCACHE_DIR: "/root/ccache"
commands: |
echo "--- Install packages"
apt-get install --yes --no-install-recommends curl ccache
export PATH="/usr/lib/ccache/:$$PATH"
echo "--- Build UCX"
curl -L https://github.com/openucx/ucx/releases/download/v1.13.0-rc1/ucx-1.13.0.tar.gz --output ucx.tar.gz
tar -zxf ucx.tar.gz
pushd ucx-*
./configure --with-rocm --enable-mt --prefix=$$(realpath ../mpi-prefix)
make -j
make install
popd
echo "--- Build OpenMPI"
curl -L https://download.open-mpi.org/release/open-mpi/v$${OPENMPI_VER}/openmpi-$${OPENMPI_VER_FULL}.tar.gz --output openmpi.tar.gz
tar -zxf openmpi.tar.gz
pushd openmpi-*
./configure --with-ucx=$$(realpath ../mpi-prefix) --prefix=$$(realpath ../mpi-prefix)
make -j
make install
popd
echo "--- Package prefix"
tar -zcf mpi-prefix.tar.gz mpi-prefix/
echo "--- ccache stats"
ccache -s
artifact_paths:
- "mpi-prefix.tar.gz"

- wait

- label: "Tests -- Julia 1.7"
plugins:
- JuliaCI/julia#v1:
version: "1.7"
persist_depot_dirs: packages,artifacts,compiled
agents:
queue: "juliagpu"
rocm: "*" # todo fix ROCM version
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 60
env:
JULIA_MPI_TEST_ARRAYTYPE: ROCArray
JULIA_MPI_TEST_NPROCS: 2
JULIA_MPI_PATH: "${BUILDKITE_BUILD_CHECKOUT_PATH}/openmpi"
OMPI_ALLOW_RUN_AS_ROOT: 1
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM: 1
OMPI_MCA_btl_vader_single_copy_mechanism: 'none' # https://github.com/open-mpi/ompi/issues/4948
OPAL_PREFIX: "${BUILDKITE_BUILD_CHECKOUT_PATH}/openmpi" # Should we set this for the user?
JULIA_CUDA_MEMORY_POOL: "none"
commands: |
echo "--- Configure MPI"
buildkite-agent artifact download --step "rocm-build-openmpi" mpi-prefix.tar.gz .
mkdir -p $${JULIA_MPI_PATH}
tar -zxf mpi-prefix.tar.gz --strip-components 1 -C $${JULIA_MPI_PATH}
export PATH=$${JULIA_MPI_PATH}/bin:$${PATH}
export LD_LIBRARY_PATH=$${JULIA_MPI_PATH}/lib:$${LD_LIBRARY_PATH}

echo "--- Setup Julia packages"
julia --color=yes --project=. -e '
import Pkg
Pkg.develop(; path = joinpath(pwd(), "lib", "MPIPreferences"))
'
julia --color=yes --project=test -e '
using Pkg
Pkg.develop(path="lib/MPIPreferences")
using MPIPreferences
MPIPreferences.use_system_binary(export_prefs=true)
rm("test/Manifest.toml")
'

echo "+++ Run tests"
julia --color=yes --project=. -e '
import Pkg
Pkg.test("MPI")
'
7 changes: 4 additions & 3 deletions docs/src/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ By default, MPI.jl will download and link against the following MPI implementati
This is suitable for most single-node use cases, but for larger systems, such as HPC
clusters or multi-GPU machines, you will probably want to configure against a
system-provided MPI implementation in order to exploit features such as fast network
interfaces and CUDA-aware MPI interfaces.
interfaces and CUDA-aware or ROCm-aware MPI interfaces.

The MPIPreferences.jl package allows the user to choose which MPI implementation to use in MPI.jl. It uses [Preferences.jl](https://github.com/JuliaPackaging/Preferences.jl) to
configure the MPI backend for each project separately. This provides
Expand Down Expand Up @@ -134,8 +134,9 @@ julia> MPIPreferences.use_system_binary()
The test suite can also be modified by the following variables:

- `JULIA_MPI_TEST_NPROCS`: How many ranks to use within the tests
- `JULIA_MPI_TEST_ARRAYTYPE`: Set to `CuArray` to test the CUDA-aware interface with
[`CUDA.CuArray](https://github.com/JuliaGPU/CUDA.jl) buffers.
- `JULIA_MPI_TEST_ARRAYTYPE`: Set to `CuArray` or `ROCArray` to test the CUDA-aware interface with
[`CUDA.CuArray`](https://github.com/JuliaGPU/CUDA.jl) or the ROCm-aware interface with
[`AMDGPU.ROCArray`](https://github.com/JuliaGPU/AMDGPU.jl) or buffers.
- `JULIA_MPI_TEST_BINARY`: Check that the specified MPI binary is used for the tests
- `JULIA_MPI_TEST_ABI`: Check that the specified MPI ABI is used for the tests

Expand Down
18 changes: 17 additions & 1 deletion docs/src/knownissues.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ _More about CUDA.jl [memory environment-variables](https://cuda.juliagpu.org/sta

Make sure to:
- Have MPI and CUDA on path (or module loaded) that were used to build the CUDA-aware MPI
- Make sure to have:
- Set the following environment variables:
```
export JULIA_CUDA_MEMORY_POOL=none
export JULIA_CUDA_USE_BINARYBUILDER=false
Expand All @@ -114,6 +114,22 @@ Make sure to:

After that, it may be preferred to run the Julia MPI script (as suggested [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/11)) launching it from a shell script (as suggested [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/4)).

## ROCm-aware MPI

### Hints to ensure ROCm-aware MPI to be functional

Make sure to:
- Have MPI and ROCm on path (or module loaded) that were used to build the ROCm-aware MPI
- Add AMDGPU and MPI packages in Julia:
```
julia -e 'using Pkg; pkg"add AMDGPU"; pkg"add MPI"; using MPI; MPI.use_system_binary()'
```
- Then in Julia, upon loading MPI and CUDA modules, you can check
- AMDGPU version: `AMDGPU.versioninfo()`
- If you are using correct MPI implementation: `MPI.identify_implementation()`

After that, [this script](https://gist.github.com/luraess/c228ec08629737888a18c6a1e397643c) can be used to verify if ROCm-aware MPI is functional (modified after the CUDA-aware version from [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/11)). It may be preferred to run the Julia ROCm-aware MPI script launching it from a shell script (as suggested [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/4)).

## Microsoft MPI

### Custom operators on 32-bit Windows
Expand Down
12 changes: 10 additions & 2 deletions docs/src/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,16 @@ $ mpiexecjl --project=/path/to/project -n 20 julia script.jl

If your MPI implementation has been compiled with CUDA support, then `CUDA.CuArray`s (from the
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package) can be passed directly as
send and receive buffers for point-to-point and collective operations (they may also work
with one-sided operations, but these are not often supported).
send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).

If using Open MPI, the status of CUDA support can be checked via the
[`MPI.has_cuda()`](@ref) function.

## ROCm-aware MPI support

If your MPI implementation has been compiled with ROCm support (AMDGPU), then `AMDGPU.ROCArray`s (from the
[AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) package) can be passed directly as send and receive buffers for point-to-point and collective operations (they may also work with one-sided operations, but these are not often supported).

Successfully running the [alltoall_test_rocm.jl](https://gist.github.com/luraess/c228ec08629737888a18c6a1e397643c) should confirm your MPI implementation to have the ROCm support (AMDGPU) enabled. Moreover, successfully running the [alltoall_test_rocm_mulitgpu.jl](https://gist.github.com/luraess/d478b3f98eae984931fd39a7158f4b9e) should confirm your ROCm-aware MPI implementation to use multiple AMD GPUs (one GPU per rank).

The status of ROCm (AMDGPU) support cannot currently be queried.
1 change: 1 addition & 0 deletions src/MPI.jl
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ function __init__()

run_load_time_hooks()

@require AMDGPU="21141c5a-9bdb-4563-92ae-f87d6854732e" include("rocm.jl")
@require CUDA="052768ef-5323-5732-b1bb-66c8b64840ba" include("cuda.jl")
end

Expand Down
6 changes: 4 additions & 2 deletions src/buffers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Currently supported are:
- `Array`
- `SubArray`
- `CUDA.CuArray` if CUDA.jl is loaded.
- `AMDGPU.ROCArray` if AMDGPU.jl is loaded.

Additionally, certain sentinel values can be used, e.g. `MPI_IN_PLACE` or `MPI_BOTTOM`.
"""
Expand Down Expand Up @@ -102,8 +103,9 @@ and `datatype`. Methods are provided for

- `Ref`
- `Array`
- `CUDA.CuArray` if CUDA.jl is loaded
- `SubArray`s of an `Array` or `CUDA.CuArray` where the layout is contiguous, sequential or
- `CUDA.CuArray` if CUDA.jl is loaded.
- `AMDGPU.ROCArray` if AMDGPU.jl is loaded.
- `SubArray`s of an `Array`, `CUDA.CuArray` or `AMDGPU.ROCArray` where the layout is contiguous, sequential or
blocked.

# See also
Expand Down
21 changes: 21 additions & 0 deletions src/rocm.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import .AMDGPU

function Base.cconvert(::Type{MPIPtr}, A::AMDGPU.ROCArray{T}) where T
Base.cconvert(Ptr{T}, A.buf.ptr) # returns DeviceBuffer
end

function Base.unsafe_convert(::Type{MPIPtr}, X::AMDGPU.ROCArray{T}) where T
reinterpret(MPIPtr, Base.unsafe_convert(Ptr{T}, X.buf.ptr))
end

# only need to define this for strided arrays: all others can be handled by generic machinery
function Base.unsafe_convert(::Type{MPIPtr}, V::SubArray{T,N,P,I,true}) where {T,N,P<:AMDGPU.ROCArray,I}
X = parent(V)
pX = Base.unsafe_convert(Ptr{T}, X)
pV = pX + ((V.offset1 + V.stride1) - first(LinearIndices(X)))*sizeof(T)
return reinterpret(MPIPtr, pV)
end

function Buffer(arr::AMDGPU.ROCArray)
Buffer(arr, Cint(length(arr)), Datatype(eltype(arr)))
end
1 change: 1 addition & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
DoubleFloats = "497a8b3b-efae-58df-a0af-a86822472b78"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Expand Down
4 changes: 4 additions & 0 deletions test/common.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ if get(ENV,"JULIA_MPI_TEST_ARRAYTYPE","") == "CuArray"
import CUDA
ArrayType = CUDA.CuArray
synchronize() = CUDA.synchronize()
elseif get(ENV,"JULIA_MPI_TEST_ARRAYTYPE","") == "ROCArray"
import AMDGPU
ArrayType = AMDGPU.ROCArray
synchronize() = nothing
simonbyrne marked this conversation as resolved.
Show resolved Hide resolved
simonbyrne marked this conversation as resolved.
Show resolved Hide resolved
else
ArrayType = Array
synchronize() = nothing
Expand Down
5 changes: 5 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ if get(ENV, "JULIA_MPI_TEST_ARRAYTYPE", "") == "CuArray"
CUDA.version()
CUDA.precompile_runtime()
ArrayType = CUDA.CuArray
elseif get(ENV,"JULIA_MPI_TEST_ARRAYTYPE","") == "ROCArray"
import AMDGPU
AMDGPU.versioninfo()
# DEBUG: currently no `precompile_runtime()` functionnality is implemented in AMDGPU.jl. If needed, it could be added by analogy of CUDA; no use of caps in AMDGPU.jl, but https://github.com/JuliaGPU/AMDGPU.jl/blob/cfaade146977594bf18e14b285ee3a9c84fbc7f2/src/execution.jl#L351-L357 shows how to construct a CompilerJob for a given agent.
ArrayType = AMDGPU.ROCArray
else
ArrayType = Array
end
Expand Down
2 changes: 1 addition & 1 deletion test/test_basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ MPI.Init()

@test MPI.has_cuda() isa Bool

if ArrayType != Array
if get(ENV,"JULIA_MPI_TEST_ARRAYTYPE","") == "CuArray"
@test MPI.has_cuda()
end

Expand Down
2 changes: 0 additions & 2 deletions test/test_bcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@ using Random

MPI.Init()


comm = MPI.COMM_WORLD

root = 0
matsize = (17,17)

Expand Down
1 change: 1 addition & 0 deletions test/test_io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ include("common.jl")
using Random

MPI.Init()

comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
sz = MPI.Comm_size(comm)
Expand Down
1 change: 1 addition & 0 deletions test/test_io_shared.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include("common.jl")

MPI.Init()

comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
sz = MPI.Comm_size(comm)
Expand Down
2 changes: 1 addition & 1 deletion test/test_io_subarray.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
include("common.jl")

using Random

MPI.Init()

comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
sz = MPI.Comm_size(comm)
Expand Down
2 changes: 1 addition & 1 deletion test/test_onesided.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
using Test
using MPI

# TODO: enable CUDA tests once OpenMPI has full support
# TODO: enable CUDA and AMDGPU tests once OpenMPI has full support
ArrayType = Array

MPI.Init()
Expand Down