Train: 0% #53

760677482 · 2021-12-27T13:54:22Z

In the beginning，I noticed that I didn‘t have "pytorch3d",so I used "pip install pytorch3d"，but it showed an error.then I used"pip unintall pytorch3d"and downloaded it from https://anaconda.org/pytorch3d/pytorch3d/files. But now, when training,it's always "Epoch: 1/25. Loop: Train: 0% 0/11579 [03:37<?, ?it/s]". I found the program stopped at this line:loss.backward().

What could be the problem?And I am using cuda 9.2 because my Driver version is outdated.looking forword to your help,thanks ！

eugenelyj · 2022-01-06T06:03:06Z

You can try unset the flag of args.distributed .

shariqfarooq123 · 2022-01-17T22:40:51Z

Please refer to instructions provided here to install pytorch3d.

If you can't install pytorch3d for your driver version, you may also give a try to pytorch3d-nightly.

As @eugenelyj pointed out, try unsetting the distributed flag. You may get a better traceback.

libetter0913 · 2022-05-13T07:44:43Z

Hello，I had the same problem. When I ran ‘python train.py args_train_nyu.txt’，The program stops here. Can you help me?

9796l · 2022-12-22T12:44:18Z

Hello，I had the same problem. When I ran ‘python train.py args_train_nyu.txt’，The program stops here. Can you help me?

9796l · 2022-12-22T12:44:51Z

excuse me,did you solve this problem?

zhangbaijin · 2023-03-24T00:40:40Z

excuse me,did you solve this problem?

Did you sovle the problem?

9796l · 2023-03-24T02:10:05Z

我好像是换了个服务器，用了4块显卡的服务器就没有报错了。

…

------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" ***@***.***>; 发送时间: 2023年3月24日(星期五) 上午8:40 ***@***.***>; ***@***.******@***.***>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zhangbaijin · 2023-03-24T03:15:50Z

我好像是换了个服务器，用了4块显卡的服务器就没有报错了。
…
------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 @.>; @.@.>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我这边用的自己的数据集，raw_image，和depth image。作者说的input.txt是指的哪个文件呢，后面857.47又代表啥意思呢？
`
Traceback (most recent call last):
File "train.py", line 403, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker
experiment_name=args.name, optimizer_state_dict=None)
File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train
args) else enumerate(train_loader):
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/autodl-tmp/UDepth-master/dataloader.py", line 87, in getitem
focal = float(sample_path.split()[2])
IndexError: list index out of range
`

jasdkfj · 2023-08-03T13:21:18Z

我好像是换了个服务器，用了4块显卡的服务器就没有报错了。
…
------------------ 原始邮件 ------------------ 发件人: "shariqfarooq123/AdaBins" @.>; 发送时间: 2023年3月24日(星期五) 上午8:40 _@**._>; _@.@._>; 主题: Re: [shariqfarooq123/AdaBins] Train: 0% (Issue #53) excuse me,did you solve this problem? Did you sovle the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: _@_.*>
我这边用的自己的数据集，raw_image，和depth image。作者说的input.txt是指的哪个文件呢，后面857.47又代表啥意思呢？
`
Traceback (most recent call last):
File "train.py", line 403, in
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/root/autodl-tmp/UDepth-master/train.py", line 109, in main_worker experiment_name=args.name, optimizer_state_dict=None) File "/root/autodl-tmp/UDepth-master/train.py", line 178, in train args) else enumerate(train_loader): File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/autodl-tmp/UDepth-master/dataloader.py", line 87, in getitem focal = float(sample_path.split()[2]) IndexError: list index out of range `

focal的问题，你分割的文件里后面肯定没有标focal length in pixels，读不出来就报错了

libetter0913 mentioned this issue May 13, 2022

Tran: 0% #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train: 0% #53

Train: 0% #53

760677482 commented Dec 27, 2021

eugenelyj commented Jan 6, 2022

shariqfarooq123 commented Jan 17, 2022

libetter0913 commented May 13, 2022

9796l commented Dec 22, 2022

9796l commented Dec 22, 2022

zhangbaijin commented Mar 24, 2023

9796l commented Mar 24, 2023 via email

zhangbaijin commented Mar 24, 2023

jasdkfj commented Aug 3, 2023

Train: 0% #53

Train: 0% #53

Comments

760677482 commented Dec 27, 2021

eugenelyj commented Jan 6, 2022

shariqfarooq123 commented Jan 17, 2022

libetter0913 commented May 13, 2022

9796l commented Dec 22, 2022

9796l commented Dec 22, 2022

zhangbaijin commented Mar 24, 2023

9796l commented Mar 24, 2023 via email

zhangbaijin commented Mar 24, 2023

jasdkfj commented Aug 3, 2023