Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROCm support (AMDGPU) #572

Merged
merged 29 commits into from
Jun 3, 2022
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7a577a7
Add ROCm (AMDGPU) support
luraess Apr 18, 2022
3dd77fa
Fix tests
luraess Apr 18, 2022
cc46cde
Fix tests
luraess Apr 18, 2022
b9d5811
Fix tests
luraess Apr 18, 2022
24c4f48
Update doc
luraess Apr 18, 2022
4782cdf
Add doc update
luraess Apr 18, 2022
9545aa8
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 18, 2022
ef153a2
Update doc with link to rocm scripts
luraess Apr 19, 2022
971a78f
Add cleaner condition
luraess Apr 19, 2022
980ed51
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 20, 2022
0fb7f7b
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess Apr 26, 2022
5e66d2c
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess May 2, 2022
1ebb7dc
Add ROCm tests
luraess May 2, 2022
77e9f2c
Update pipeline.yml
luraess May 2, 2022
dc76404
Update buildkite ROCm MPI launch params
luraess May 2, 2022
79006dd
Merge branch 'JuliaParallel:master' into lr/rocmaware-dev
luraess May 12, 2022
bb53453
Uncomment failing tests
luraess May 13, 2022
bd7d403
Update CI MPI wrapper
luraess May 17, 2022
2b85ac5
Merge branch 'master' of github.com:JuliaParallel/MPI.jl into JuliaPa…
luraess May 31, 2022
1bdcee2
Merge branch 'JuliaParallel-master' into lr/rocmaware-dev
luraess May 31, 2022
10b454c
Add AMDGPU support to test.
luraess May 31, 2022
83cfe02
add buildkite script
simonbyrne Jun 1, 2022
fa73ba2
use latest Open MPI
simonbyrne Jun 1, 2022
3426a3a
disable AMDGPU julia 1.6
simonbyrne Jun 1, 2022
27bb633
try UCX 1.13-rc1
simonbyrne Jun 2, 2022
83fe889
Add synchronize
simonbyrne Jun 2, 2022
28edee4
Update test/common.jl
simonbyrne Jun 2, 2022
5fd4180
add more synchronize()
simonbyrne Jun 2, 2022
727c8ea
modify conversion to MPIPtr
simonbyrne Jun 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/rocm.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import .AMDGPU

function Base.cconvert(::Type{MPIPtr}, A::AMDGPU.ROCArray{T}) where T
Base.cconvert(Ptr{T}, A.buf.ptr) # returns DeviceBuffer
A
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong here @simonbyrne ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A.buf.ptr is a raw pointer, so the object could be cleaned up by GC. It also loses the offset argument required for views.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! TIL

end

function Base.unsafe_convert(::Type{MPIPtr}, X::AMDGPU.ROCArray{T}) where T
reinterpret(MPIPtr, Base.unsafe_convert(Ptr{T}, X.buf.ptr))
reinterpret(MPIPtr, Base.unsafe_convert(Ptr{T}, X.buf.ptr+X.offset))
end

# only need to define this for strided arrays: all others can be handled by generic machinery
Expand Down