We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Today I get the following error when I use OmniQuant to quantize Tinyllama-1.1B-Chat-v1.0 with Pytorch and cuda:
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. 0%| | 0/166 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/autodl-tmp/OmniQuant/main.py", line 378, in main() File "/root/autodl-tmp/OmniQuant/main.py", line 373, in main evaluate(lm, args,logger) File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/autodl-tmp/OmniQuant/main.py", line 124, in evaluate outputs = lm.model.model(batch) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1070, in forward layer_outputs = decoder_layer( File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/autodl-tmp/OmniQuant/models/int_llama_layer.py", line 237, in forward hidden_states = self.input_layernorm(hidden_states) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/autodl-tmp/OmniQuant/quantize/omni_norm.py", line 54, in forward variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
srcIndex < srcSelectDimSize
TORCH_USE_CUDA_DSA
How can I resolve this problem?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Today I get the following error when I use OmniQuant to quantize Tinyllama-1.1B-Chat-v1.0 with Pytorch and cuda:
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [73,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [74,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [75,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [76,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [77,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [78,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [79,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [80,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [81,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [82,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [83,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [84,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [85,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [86,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [87,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [88,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [89,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [90,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [91,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [92,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [93,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [94,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [487,0,0], thread: [95,0,0] Assertion
srcIndex < srcSelectDimSize
failed.0%| | 0/166 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/OmniQuant/main.py", line 378, in
main()
File "/root/autodl-tmp/OmniQuant/main.py", line 373, in main
evaluate(lm, args,logger)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/OmniQuant/main.py", line 124, in evaluate
outputs = lm.model.model(batch)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1070, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/OmniQuant/models/int_llama_layer.py", line 237, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/OmniQuant/quantize/omni_norm.py", line 54, in forward
variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.How can I resolve this problem?
The text was updated successfully, but these errors were encountered: