-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot install apex on the machine of CUDA 12.2 #1761
Comments
same issue |
Similar issue: My GPU version is also CUDA 12.2. Installing apex directly results in the same error as mentioned above. Then I switched to a conda virtual environment with CUDA version 11.3. My Torch version corresponds to CUDA 11.3, which is PyTorch 1.10. After that, using Traceback (most recent call last):
File ".../VALOR/./train.py", line 88, in <module>
main(args)
File ".../VALOR/./train.py", line 55, in main
model = VALOR.from_pretrained(opts,checkpoint)
File ".../VALOR/model/modeling.py", line 109, in from_pretrained
model = cls(opts, *inputs, **kwargs)
File ".../VALOR/model/pretrain.py", line 67, in __init__
super().__init__(opts)
File ".../VALOR/model/modeling.py", line 328, in __init__
self.load_ast_model(base_cfg,config)
File ".../VALOR/model/modeling.py", line 609, in load_ast_model
self.audio_encoder = TransformerEncoder(model_cfg_audio, mode='prenorm')
File ".../VALOR/model/transformer.py", line 149, in __init__
layer = TransformerLayer(config, mode)
File ".../VALOR/model/transformer.py", line 62, in __init__
self.layernorm1 = LayerNorm(config.hidden_size, eps=1e-12)
File ".../anaconda3/envs/valor1/lib/python3.9/site-packages/apex/normalization/fused_layer_norm.py", line 268, in __init__
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File ".../anaconda3/envs/valor1/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda' 'Readme' shows that we can use the option |
same issue: |
I think you can remove the check code in setup.py, then use |
I've encountered the same issue. @Zhangwq76 could you tell which part of check code we should remove? |
line 39, in |
+1 |
I've got a quick fix for this https://github.com/googio/apex based on @Zhangwq76 solution
|
meet the same issue , do you solve it? i use the cuda 12.2 with torch2.1 and i modify the version check code in setup.py and use |
meet the same issue , do you solve it? i use the cuda 12.2 with torch2.1 and i modify the version check code in setup.py and use |
Describe the Bug
Minimal Steps/Code to Reproduce the Bug
running script:
"python setup.py install --cpp_ext --cuda_ext"
The reporting log:
"torch.version = 2.1.2+cu121
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
from /usr/bin
Traceback (most recent call last):
File "/home/hwq/ray/adversarial_examples/apex/setup.py", line 178, in
check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)
File "/home/hwq/ray/adversarial_examples/apex/setup.py", line 40, in check_cuda_torch_binary_vs_bare_metal
raise RuntimeError(
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: #323 (comment). You can try commenting out this check (at your own risk)."
CUDA Version is 12.2.
Expected Behavior
Install apex successfully
Environment
uname -a
Linux ps 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
nvidia-smi
Fri Dec 22 00:15:43 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
The text was updated successfully, but these errors were encountered: