Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix this bug when I try to train resnet20 for cifar-10 #29

Open
BodongDu opened this issue May 31, 2024 · 1 comment
Open

How to fix this bug when I try to train resnet20 for cifar-10 #29

BodongDu opened this issue May 31, 2024 · 1 comment

Comments

@BodongDu
Copy link

(flashatt) yangyk@yyk-s1:~/yangyk/NN_CUDA/lsq/lsq-net$ python main.py ./examples/lsq/resnet20_a2w2_cifar10.yaml
/home/yangyk/yangyk/NN_CUDA/lsq/lsq-net
<class 'pathlib.PosixPath'>
INFO - Log file for this run: /home/yangyk/yangyk/NN_CUDA/lsq/lsq-net/out/resnet20_a2w2_cifar10_20240531-171941/resnet20_a2w2_cifar10_20240531-171941.log
INFO - TensorBoard data directory: /home/yangyk/yangyk/NN_CUDA/lsq/lsq-net/out/resnet20_a2w2_cifar10_20240531-171941/tb_runs
Files already downloaded and verified
Files already downloaded and verified
/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 32 worker processes in total. Our suggested max number of worker in current system is 16, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
INFO - Dataset cifar10 size:
Training Set = 50000 (196)
Validation Set = 10000 (40)
Test Set = 10000 (40)
INFO - Created resnet20 model for cifar10 dataset
Use pre-trained model = True
tensor(8)
Traceback (most recent call last):
File "main.py", line 120, in
main()
File "main.py", line 59, in main
tbmonitor.writer.add_graph(model, input_to_model=train_loader.dataset[0][0].unsqueeze(0))
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 841, in add_graph
graph(model, input_to_model, verbose, use_strict_trace)
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 337, in graph
trace = torch.jit.trace(model, args, strict=use_strict_trace)
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/jit/_trace.py", line 794, in trace
return trace_module(
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/jit/_trace.py", line 1056, in trace_module
module._c._create_method_from_trace(
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/yangyk/anaconda3/envs/flashatt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/yangyk/yangyk/NN_CUDA/lsq/lsq-net/model/resnet_cifar.py", line 125, in forward
out = F.avg_pool2d(out, kernel_size=out.size()[3])
TypeError: avg_pool2d(): argument 'kernel_size' must be tuple of ints, not Tensor

How to fix this bug when I try to train resnet20 for cifar-10

@BodongDu
Copy link
Author

INFO - >>>>>>>> Epoch -1 (pre-trained model evaluation)
INFO - Validation: 10000 samples (256 per mini-batch)
8
torch.Size([256, 64])
torch.Size([256, 10])
Traceback (most recent call last):
File "main.py", line 120, in
main()
File "main.py", line 94, in main
top1, top5, _ = process.validate(val_loader, model, criterion,
File "/home/yangyk/yangyk/NN_CUDA/lsq/lsq-net/process.py", line 104, in validate
acc1, acc5 = accuracy(outputs.data, targets.data, topk=(1, 5))
File "/home/yangyk/yangyk/NN_CUDA/lsq/lsq-net/process.py", line 27, in accuracy
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
And also this bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant