Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train: 0% #53

Open
760677482 opened this issue Dec 27, 2021 · 9 comments
Open

Train: 0% #53

760677482 opened this issue Dec 27, 2021 · 9 comments

Comments

@760677482
Copy link

In the beginning,I noticed that I didn‘t have "pytorch3d",so I used "pip install pytorch3d",but it showed an error.then I used"pip unintall pytorch3d"and downloaded it from https://anaconda.org/pytorch3d/pytorch3d/files. But now, when training,it's always "Epoch: 1/25. Loop: Train: 0% 0/11579 [03:37<?, ?it/s]". I found the program stopped at this line:loss.backward().

What could be the problem?And I am using cuda 9.2 because my Driver version is outdated.looking forword to your help,thanks !

@eugenelyj
Copy link

You can try unset the flag of args.distributed .

@shariqfarooq123
Copy link
Owner

Please refer to instructions provided here to install pytorch3d.

If you can't install pytorch3d for your driver version, you may also give a try to pytorch3d-nightly.

As @eugenelyj pointed out, try unsetting the distributed flag. You may get a better traceback.

@libetter0913
Copy link

Hello,I had the same problem. When I ran ‘python train.py args_train_nyu.txt’,The program stops here. Can you help me?
image

@9796l
Copy link

9796l commented Dec 22, 2022

Hello,I had the same problem. When I ran ‘python train.py args_train_nyu.txt’,The program stops here. Can you help me? image

@9796l
Copy link

9796l commented Dec 22, 2022

excuse me,did you solve this problem?

@zhangbaijin
Copy link

excuse me,did you solve this problem?

Did you sovle the problem?

@9796l
Copy link

9796l commented Mar 24, 2023 via email

@zhangbaijin
Copy link

我好像是换了个服务器,用了4块显卡的服务器就没有报错了。

------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 @.>; @.@.>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我这边用的自己的数据集,raw_image,和depth image。作者说的input.txt是指的哪个文件呢,后面857.47又代表啥意思呢?
`
Traceback (most recent call last):
File "train.py", line 403, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker
experiment_name=args.name, optimizer_state_dict=None)
File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train
args) else enumerate(train_loader):
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/autodl-tmp/UDepth-master/dataloader.py", line 87, in getitem
focal = float(sample_path.split()[2])
IndexError: list index out of range
`

@jasdkfj
Copy link

jasdkfj commented Aug 3, 2023

我好像是换了个服务器,用了4块显卡的服务器就没有报错了。

------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 _@**._>; _@.@._>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: _@_.*>
我这边用的自己的数据集,raw_image,和depth image。作者说的input.txt是指的哪个文件呢,后面857.47又代表啥意思呢?
`
Traceback (most recent call last):
File "train.py", line 403, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker experiment_name=args.name, optimizer_state_dict=None) File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train args) else enumerate(train_loader): File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/autodl-tmp/UDepth-master/dataloader.py", line 87, in getitem focal = float(sample_path.split()[2]) IndexError: list index out of range `

focal的问题,你分割的文件里后面肯定没有标focal length in pixels,读不出来就报错了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants