-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: invalid device function #982
Comments
You might be using an older |
@ptrblck Thank you for your answer. I followed the installation guide. So where can I get a newer version of |
You should get an error, if you are trying to compile |
@ptrblck My server was installed with |
I am trying to run this github project and I encountered a CUDA error with apex.
`Traceback (most recent call last):
File "train_AEI.py", line 132, in
scaled_loss.backward()
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/contextlib.py", line 119, in exit
next(self.gen)
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 128, in post_backward_models_are_masters
scale_override=grads_have_scale/out_scale)
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/apex/amp/scaler.py", line 117, in unscale
1./scale)
File "/home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/apex/multi_tensor_apply/multi_tensor_apply.py", line 30, in call
*args)
RuntimeError: CUDA error: invalid device function (multi_tensor_apply at csrc/multi_tensor_apply.cuh:111)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f7679444193 in /home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: void multi_tensor_apply<2, ScaleFunctor<float, float>, float>(int, int, at::Tensor const&, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > > const&, ScaleFunctor<float, float>, float) + 0x1270 (0x7f7668c39ce0 in /home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: multi_tensor_scale_cuda(int, at::Tensor, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > >, float) + 0x829 (0x7f7668c37c99 in /home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x25e5a (0x7f7668c27e5a in /home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: + 0x1f641 (0x7f7668c21641 in /home/ivdai/anaconda3/envs/ccx_test0/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #35: __libc_start_main + 0xf0 (0x7f767da69840 in /lib/x86_64-linux-gnu/libc.so.6)
Segmentation fault (core dumped)`
What could be the problem?
The text was updated successfully, but these errors were encountered: