GPU implementation of hamming distance #541

felixpetschko · 2024-08-19T18:53:03Z

Hamming distance implementation with numba.cuda for GPU support.
This is built on top of the changes in Hamming distance implementation with Numba #512

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

…tcrdist distance metrics

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

…curing in all sequences

for more information, see https://pre-commit.ci

…into gpu_hamming

for more information, see https://pre-commit.ci

grst · 2024-11-04T18:59:06Z

Hi @felixpetschko, what's the status here? Do you need anything from myself or Severin?

I've seen you switched to Cupy, could you elaborate how that compares to the numba implementation?

felixpetschko · 2024-11-05T08:16:34Z

Hi @grst! I am mainly done with my implementation here. Currently the speedup on my laptop for 1 million cells for the ir_dist function with hamming is at around 10 (45 vs. 480 seconds) compared to the new fast numba CPU implementation (and probably >100 compared to the original CPU implemenation). I think this is also the maximum speedup I would aim at for now, because there are currently some sequential parts (1 cpu) in the ir_dist function besides the hamming GPU kernel and the upstream processing for reading and preparing the data takes already longer anyway. So further optimization of the hamming kernel wouldn't be very effective.

My plan would be to prepare a pull request that is ready for review over the next days.

The reasons for switching to CuPy were the following:
Numba cuda only provides limited cuda features and you never really know how the numba code is mapped to the cuda features internally. However CuPy allows you to write the cuda kernels directly in C++ and offers all cuda features (that i have seen so far). Also most online ressources about GPU programming are about cuda kernels written in C++ and numba cuda is very niche. Also other programmers that might look at the code in the future will probably only know C++ cuda kernels (if they already did GPU programming). When doing GPU programming I use profiling tools to find out what is actually going on at the hardware and what the compiler did, so having this additional abstraction level with numba can actually be a nightmare.

grst · 2024-11-05T10:23:43Z

Makes sense, thanks!
Still not sure if this should be merged in to scirpy or into rapids-singlecell then. @Intron7, I'll also bring that up at the next core dev meeting what's our long-term strategy here.

Intron7 · 2024-11-08T10:22:24Z

Hey @felixpetschko can you send me a larger dataset to test this? I have some ideas and want to see if this works.

…into gpu_hamming

for more information, see https://pre-commit.ci

grst · 2024-11-16T19:14:09Z

Still not sure if this should be merged in to scirpy or into rapids-singlecell then. @Intron7, I'll also bring that up at the next core dev meeting what's our long-term strategy here.

@felixpetschko, the outcome of this discussion was that the function stays here, and we'll setup a GPU CI for scirpy. @ilan-gold or @flying-sheep can help with that once this PR is ready.

for more information, see https://pre-commit.ci

codecov · 2024-12-05T15:13:15Z

Codecov Report

Attention: Patch coverage is 53.71622% with 137 lines in your changes missing coverage. Please review.

Project coverage is 79.44%. Comparing base (08e0cc3) to head (8315e2f).
Report is 22 commits behind head on main.

Files with missing lines	Patch %	Lines
src/scirpy/ir_dist/metrics.py	4.44%	129 Missing ⚠️
src/scirpy/ir_dist/_clonotype_neighbors.py	95.52%	6 Missing ⚠️
src/scirpy/ir_dist/__init__.py	50.00%	1 Missing ⚠️
src/scirpy/ir_dist/_util.py	95.83%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #541      +/-   ##
==========================================
- Coverage   81.43%   79.44%   -2.00%     
==========================================
  Files          49       50       +1     
  Lines        4213     4525     +312     
==========================================
+ Hits         3431     3595     +164     
- Misses        782      930     +148

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

grst · 2024-12-10T14:04:36Z

@Intron7, Appears phil created documentation for the gpu ci: https://github.com/scverse/governance/blob/main/developer/gpu_ci.md

Maybe this helps?

Intron7 · 2024-12-10T14:43:31Z

@grst I talked to @Zethson and he adjusted the setting in cirun. The test is now running but failing.

felixpetschko and others added 30 commits April 29, 2024 13:28

take static methods out of tcrdist

bad62d8

made _tcrdist_mat a normal class method

72565bf

parent method NumbaDistanceCalculator extracted

add8e7f

numba version of hamming distance implemented

e9c0642

hamming numba tests passed and reference test added

68e0493

hamming numba distance calculator implemented and tested

ef0fa7d

n_jobs parameter handling done in NumbaDistanceCalculator superclass

0b15f8b

documentation adapted

46bfc14

removed unnecessary import

e339e14

[pre-commit.ci] auto fixes from pre-commit.com hooks

7da4519

for more information, see https://pre-commit.ci

hamming distance with numba parallelization implemented

82b0259

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

b2d28d3

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

249e626

for more information, see https://pre-commit.ci

imports fixed

2fccc6a

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

9ee1a2b

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

a68ab53

for more information, see https://pre-commit.ci

implemented parallelization with n_jobs and n_blocks for hamming and …

d68a10b

…tcrdist distance metrics

performance optimization for hamming and tcrdist

0005e63

more documentation added

6f16a3e

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

6b32311

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad13f52

for more information, see https://pre-commit.ci

documentation adapted

08ad838

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

a8d9846

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

b86030c

for more information, see https://pre-commit.ci

documentation adapted

2fb8254

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

bb0f430

… into numba_hamming

signature of _calc_dist_mat_block changed

80ae271

the alphabet for the hamming distance is now the unique characters oc…

91c1dea

…curing in all sequences

[pre-commit.ci] auto fixes from pre-commit.com hooks

899e2eb

for more information, see https://pre-commit.ci

Merge branch 'main' into numba_hamming

a0627b4

felixpetschko and others added 9 commits October 19, 2024 10:33

experiments, run 1 million cells with global memory

a39dbeb

run 1 million cells with only global memory

972836f

refactoring and time measurements

8d0c2e4

optimized seqs2mat

6bc496c

increased result matrix stacking speed

2d4756f

changed data dtype to int8

c2a290f

scaled to 1 million cells

459c2a2

Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy …

1fb02b1

…into gpu_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

f54dc7e

for more information, see https://pre-commit.ci

felixpetschko and others added 4 commits November 14, 2024 16:38

sort indices of result csr matrix

38f1fea

refactoring

c646651

Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy …

897c17b

…into gpu_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

f6668c4

for more information, see https://pre-commit.ci

grst changed the title ~~Hamming distance implementation with numba.cuda (GPU)~~ GPU implementation of hamming distance Nov 16, 2024

Intron7 and others added 4 commits December 5, 2024 15:54

remove test from ci

f7a4a03

[pre-commit.ci] auto fixes from pre-commit.com hooks

9e021fd

for more information, see https://pre-commit.ci

move cupy import to func

44d7b34

[pre-commit.ci] auto fixes from pre-commit.com hooks

8315e2f

for more information, see https://pre-commit.ci

add GPU test

8044b28

Intron7 added the run-gpu-ci runs GPU CI label Dec 5, 2024

Intron7 and others added 2 commits December 5, 2024 16:28

rename

d638cfa

Rename .cirun.yaml to .cirun.yml

a644fef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU implementation of hamming distance #541

GPU implementation of hamming distance #541

felixpetschko commented Aug 19, 2024

grst commented Nov 4, 2024

felixpetschko commented Nov 5, 2024

grst commented Nov 5, 2024

Intron7 commented Nov 8, 2024

grst commented Nov 16, 2024

codecov bot commented Dec 5, 2024

grst commented Dec 10, 2024

Intron7 commented Dec 10, 2024

GPU implementation of hamming distance #541

Are you sure you want to change the base?

GPU implementation of hamming distance #541

Conversation

felixpetschko commented Aug 19, 2024

grst commented Nov 4, 2024

felixpetschko commented Nov 5, 2024

grst commented Nov 5, 2024

Intron7 commented Nov 8, 2024

grst commented Nov 16, 2024

codecov bot commented Dec 5, 2024

Codecov Report

grst commented Dec 10, 2024

Intron7 commented Dec 10, 2024