training error #60

squirreljj · 2023-10-20T16:19:00Z

enviroment:
I build a docker from voxel-rcnn,the way is docker pull djiajun1206/pcdet-pytorch1.5
my computer is 3080ti
command
my training command is :python train.py --cfg_file cfgs/kitti_models/sfd.yaml
and batch_size = 1
error below
2023-10-20 12:14:19,784 INFO Start training kitti_models/sfd(default)
epochs: 0%| | 0/12 [00:10<?, ?it/s]
Traceback (most recent call last): | 0/3712 [00:00<?, ?it/s]
File "train.py", line 200, in
main()
File "train.py", line 155, in main
train_model(
File "/home/SFD/tools/train_utils/train_utils.py", line 86, in train_model
accumulated_iter = train_one_epoch(
File "/home/SFD/tools/train_utils/train_utils.py", line 38, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/SFD/pcdet/models/init.py", line 30, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/SFD/pcdet/models/detectors/sfd.py", line 11, in forward
batch_dict = cur_module(batch_dict)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/SFD/pcdet/models/backbones_3d/spconv_backbone.py", line 148, in forward
x = self.conv_input(input_sp_tensor)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/modules.py", line 134, in forward
input = module(input)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/conv.py", line 196, in forward
out_features = Fsp.indice_subm_conv(features, self.weight,
File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/functional.py", line 87, in forward
return ops.indice_conv(features,
File "/usr/local/lib/python3.8/dist-packages/spconv-1.2.1-py3.8-linux-x86_64.egg/spconv/ops.py", line 118, in indice_conv
return torch.ops.spconv.indice_conv(features, filters, indice_pairs,
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

The text was updated successfully, but these errors were encountered:

squirreljj · 2023-10-20T16:22:58Z

beacause i want to compile sucess,so before i perform python setup.py develop,i perform export TORH_CUDA_ARCH_LIST="7.5",finally, I compile sucess, but show error as i told on list comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training error #60

training error #60

squirreljj commented Oct 20, 2023

squirreljj commented Oct 20, 2023

training error #60

training error #60

Comments

squirreljj commented Oct 20, 2023

squirreljj commented Oct 20, 2023