Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU implementation of hamming distance #541

Open
wants to merge 118 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
bad62d8
take static methods out of tcrdist
felixpetschko Apr 29, 2024
72565bf
made _tcrdist_mat a normal class method
felixpetschko Apr 29, 2024
add8e7f
parent method NumbaDistanceCalculator extracted
felixpetschko Apr 29, 2024
e9c0642
numba version of hamming distance implemented
felixpetschko Apr 29, 2024
68e0493
hamming numba tests passed and reference test added
felixpetschko Apr 29, 2024
ef0fa7d
hamming numba distance calculator implemented and tested
felixpetschko Apr 29, 2024
0b15f8b
n_jobs parameter handling done in NumbaDistanceCalculator superclass
felixpetschko Apr 29, 2024
46bfc14
documentation adapted
felixpetschko Apr 29, 2024
e339e14
removed unnecessary import
felixpetschko Apr 29, 2024
7da4519
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2024
82b0259
hamming distance with numba parallelization implemented
felixpetschko May 2, 2024
b2d28d3
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko May 2, 2024
249e626
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 2, 2024
2fccc6a
imports fixed
felixpetschko May 2, 2024
9ee1a2b
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko May 2, 2024
a68ab53
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 2, 2024
d68a10b
implemented parallelization with n_jobs and n_blocks for hamming and …
felixpetschko May 6, 2024
0005e63
performance optimization for hamming and tcrdist
felixpetschko May 6, 2024
6f16a3e
more documentation added
felixpetschko May 6, 2024
6b32311
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko May 6, 2024
ad13f52
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 6, 2024
08ad838
documentation adapted
felixpetschko May 7, 2024
a8d9846
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko May 7, 2024
b86030c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2024
2fb8254
documentation adapted
felixpetschko May 7, 2024
bb0f430
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko May 7, 2024
80ae271
signature of _calc_dist_mat_block changed
felixpetschko Aug 7, 2024
91c1dea
the alphabet for the hamming distance is now the unique characters oc…
felixpetschko Aug 7, 2024
899e2eb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 7, 2024
a0627b4
Merge branch 'main' into numba_hamming
grst Aug 8, 2024
d5dbe8e
normalized hamming distance added
felixpetschko Aug 9, 2024
5a6ef24
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2024
3060314
renaming test
felixpetschko Aug 9, 2024
6b2b025
histogram creation for hamming distance added
felixpetschko Aug 9, 2024
fc24ac5
merge
felixpetschko Aug 9, 2024
5670a84
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2024
53d542e
refactored
felixpetschko Aug 9, 2024
fe31be7
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 9, 2024
59a3c9d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2024
c1d9d51
hamming histogram adjustments
felixpetschko Aug 9, 2024
ce4f3f3
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 9, 2024
5419d0b
reference test cases added for normalized hamming and hamming histogram
felixpetschko Aug 11, 2024
effee43
Merge branch 'main' into numba_hamming
grst Aug 13, 2024
da07aff
Update src/scirpy/ir_dist/metrics.py
felixpetschko Aug 15, 2024
a934743
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 15, 2024
c0e0381
test cases for normalized hamming and hamming histogram adapted
felixpetschko Aug 15, 2024
6a220a7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
8683ab6
docstring for normalized hamming distance and tcrdist distance added
felixpetschko Aug 15, 2024
bfe35fd
adapted default parameters and tests for n_jobs and n_blocks
felixpetschko Aug 15, 2024
a91130a
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 15, 2024
15c04bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
d45ab64
test_sequence_dist_all_metrics adaptions
felixpetschko Aug 15, 2024
1730b3f
n_jobs default value set to -1
felixpetschko Aug 15, 2024
8f18210
docstring of ir_dist for n_jobs adapted
felixpetschko Aug 15, 2024
d62c1c8
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 15, 2024
8cfc1c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
efbc37b
docstring change to test cicd pipeline
felixpetschko Aug 15, 2024
ae3121a
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 15, 2024
9398d7c
docstring for n_jobs of _ir_dist changed
felixpetschko Aug 15, 2024
66d6f70
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
afc03ba
docstring for n_jobs of _ir_dist changed
felixpetschko Aug 15, 2024
def8b1e
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 15, 2024
dc8dae4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
758feed
moved histogram creation to parent class of hamming distance calculator
felixpetschko Aug 16, 2024
04d0db7
histogram computation adaptions
felixpetschko Aug 16, 2024
b7ed4ca
test case test_tcrdist_histogram_not_implemented added
felixpetschko Aug 16, 2024
d40e193
documentation for histogram adapted
felixpetschko Aug 16, 2024
f2c32af
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 16, 2024
54c0cc2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2024
b16c705
reformatted doc string
felixpetschko Aug 16, 2024
3aab445
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 16, 2024
2419bfb
handling of symmetric matrices with respect to histogram variable cha…
felixpetschko Aug 16, 2024
e516a86
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2024
20c14eb
retrieval of usable cpus for numba adapted
felixpetschko Aug 16, 2024
7c2cc06
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 16, 2024
f566211
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2024
f68dd70
more documentation for histogram and (hamming) normalize added
felixpetschko Aug 19, 2024
3955eb1
Merge branch 'numba_hamming' of https://github.com/felixpetschko/scir…
felixpetschko Aug 19, 2024
d9dd20e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 19, 2024
636b8e0
added GPUHammingDistanceCalculator
felixpetschko Aug 19, 2024
3e4f0d3
added test case for gpu hamming distance
felixpetschko Aug 19, 2024
30f6947
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 19, 2024
57e31a6
Merge branch 'main' of https://github.com/scverse/scirpy into gpu_ham…
felixpetschko Aug 29, 2024
2f3bfb3
documentation for GPUHammingDistanceCalculator adapted
felixpetschko Aug 29, 2024
3b49d16
adapted documentation of _tcrdist_mat
felixpetschko Aug 29, 2024
51c6721
Merge branch 'main' into gpu_hamming
grst Aug 30, 2024
f50d082
Merge branch 'main' of https://github.com/scverse/scirpy into gpu_ham…
felixpetschko Sep 27, 2024
03afdd3
cuda numba experiments
felixpetschko Sep 29, 2024
0b0c5fb
cupy experiments
felixpetschko Sep 29, 2024
babdf9a
cupy experiments
felixpetschko Sep 30, 2024
f22ae38
scaled cupy to 1 million cells
felixpetschko Sep 30, 2024
730cb80
sorted sequences by length
felixpetschko Oct 2, 2024
5e4776c
textures used for seqs_mat1 and seqs_mat2
felixpetschko Oct 16, 2024
e6bf393
texture mit up to 100k cells
felixpetschko Oct 16, 2024
da251d0
sorted seqs with multiple blocks
felixpetschko Oct 16, 2024
60ec651
scaled textures to 1 million cells
felixpetschko Oct 16, 2024
6f9d6bd
use char for sequences
felixpetschko Oct 17, 2024
adbd239
shared memory used
felixpetschko Oct 17, 2024
a39dbeb
experiments, run 1 million cells with global memory
felixpetschko Oct 19, 2024
972836f
run 1 million cells with only global memory
felixpetschko Oct 19, 2024
8d0c2e4
refactoring and time measurements
felixpetschko Oct 19, 2024
6bc496c
optimized seqs2mat
felixpetschko Oct 19, 2024
2d4756f
increased result matrix stacking speed
felixpetschko Oct 21, 2024
c2a290f
changed data dtype to int8
felixpetschko Oct 21, 2024
459c2a2
scaled to 1 million cells
felixpetschko Oct 21, 2024
1fb02b1
Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy…
felixpetschko Oct 21, 2024
f54dc7e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 21, 2024
38f1fea
sort indices of result csr matrix
felixpetschko Nov 14, 2024
c646651
refactoring
felixpetschko Nov 14, 2024
897c17b
Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy…
felixpetschko Nov 14, 2024
f6668c4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 14, 2024
f7a4a03
remove test from ci
Intron7 Dec 5, 2024
9e021fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
44d7b34
move cupy import to func
Intron7 Dec 5, 2024
8315e2f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
8044b28
add GPU test
Intron7 Dec 5, 2024
d638cfa
rename
Intron7 Dec 5, 2024
a644fef
Rename .cirun.yaml to .cirun.yml
Intron7 Dec 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .cirun.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
runners:
- name: aws-gpu-runner
cloud: aws
instance_type: g4dn.xlarge
machine_image: ami-067a4ba2816407ee9
region: eu-north-1
preemptible:
- true
- false
labels:
- cirun-aws-gpu
65 changes: 65 additions & 0 deletions .github/workflows/test-gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: GPU-CI

on:
push:
branches: [main]
pull_request:
types:
- labeled
- opened
- synchronize

# Cancel the job if new commits are pushed
# https://stackoverflow.com/questions/66335225/how-to-cancel-previous-runs-in-the-pr-when-you-push-new-commitsupdate-the-curre
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: flying-sheep/check@v1
with:
success: ${{ github.event_name == 'push' || contains(github.event.pull_request.labels.*.name, 'run-gpu-ci') }}
test:
name: GPU Tests
needs: check
runs-on: "cirun-aws-gpu--${{ github.run_id }}"
timeout-minutes: 30

defaults:
run:
shell: bash -el {0}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Nvidia SMI sanity check
run: nvidia-smi

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install UV
uses: hynek/setup-cached-uv@v2
with:
cache-dependency-path: pyproject.toml

- name: Install scirpy
run: uv pip install --system -e ".[dev,test,rpack,dandelion,diversity,parasail,cupy]"
- name: Pip list
run: pip list

- name: Run test
run: pytest -m gpu

- name: Remove 'run-gpu-ci' Label
if: always()
uses: actions-ecosystem/action-remove-labels@v1
with:
labels: "run-gpu-ci"
github_token: ${{ secrets.GITHUB_TOKEN }}
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
PLATFORM: ${{ matrix.os }}
DISPLAY: :42
run: |
coverage run -m pytest -v --color=yes
coverage run -m pytest -v --color=yes -m "not gpu"
- name: Report coverage
run: |
coverage report
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ fail_fast: false
default_language_version:
python: python3
default_stages:
- commit
- push
- pre-commit
- pre-push
minimum_pre_commit_version: 2.16.0
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
Expand Down
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ test = [
'coverage',
'black',
]
cupy = [
'cupy-cuda12x',
]
dandelion = [
'sc-dandelion>=0.3.5',
]
Expand Down Expand Up @@ -107,7 +110,8 @@ xfail_strict = true
# ]
markers = [
"conda: marks a subset of tests to be ran on the Bioconda CI.",
"extra: marks tests that require extra dependencies."
"extra: marks tests that require extra dependencies.",
"gpu: mark test to run on GPU",
]
minversion = 6.0
norecursedirs = [ '.*', 'build', 'dist', '*.egg', 'data', '__pycache__']
Expand Down
6 changes: 6 additions & 0 deletions src/scirpy/ir_dist/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ def IrNeighbors(*args, **kwargs):
BLOSUM62 matrix. Faster implementation of `alignment` with some loss.
This option is incompatible with nucleotide sequences.
See :class:`~scirpy.ir_dist.metrics.FastAlignmentDistanceCalculator`.
* `normalized_hamming` -- Normalized Hamming distance (in percent) for CDR3 sequences of equal length.
See :class:`~scirpy.ir_dist.metrics.HammingDistanceCalculator`.
* `tcrdist` -- Distance based on pairwise sequence alignments between TCR CDR3 sequences based on the tcrdist metric.
See :class:`~scirpy.ir_dist.metrics.TCRdistDistanceCalculator`.
Comment on lines +63 to +66
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `normalized_hamming` -- Normalized Hamming distance (in percent) for CDR3 sequences of equal length.
See :class:`~scirpy.ir_dist.metrics.HammingDistanceCalculator`.
* `tcrdist` -- Distance based on pairwise sequence alignments between TCR CDR3 sequences based on the tcrdist metric.
See :class:`~scirpy.ir_dist.metrics.TCRdistDistanceCalculator`.

this seems duplicated now

* any instance of :class:`~scirpy.ir_dist.metrics.DistanceCalculator`.
"""

Expand Down Expand Up @@ -105,6 +109,8 @@ def _get_distance_calculator(metric: MetricType, cutoff: int | None, *, n_jobs=-
dist_calc = metrics.HammingDistanceCalculator(n_jobs=n_jobs, **kwargs)
elif metric == "normalized_hamming":
dist_calc = metrics.HammingDistanceCalculator(n_jobs=n_jobs, normalize=True, **kwargs)
elif metric == "gpu_hamming":
dist_calc = metrics.GPUHammingDistanceCalculator(**kwargs)
elif metric == "tcrdist":
dist_calc = metrics.TCRdistDistanceCalculator(n_jobs=n_jobs, **kwargs)
else:
Expand Down
Loading
Loading