Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Checkpoint path should be absolute. Got while running train #160

Open
sumanttyagi opened this issue Jul 16, 2024 · 1 comment
Open

'Checkpoint path should be absolute. Got while running train #160

sumanttyagi opened this issue Jul 16, 2024 · 1 comment

Comments

@sumanttyagi
Copy link

got the below error while training the custom data

 Skipping global process sync, barrier name: create_tmp_directory:post.checkpoint_1
I0716 20:30:05.510361 140313332638656 utils.py:219] Skipping global process sync, barrier name: Checkpointer:save.checkpoint_1
Traceback (most recent call last):
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/sdb1/sumant/multi_nerf/multinerf/train.py", line 292, in <module>
    app.run(main)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/mnt/sdb1/sumant/multi_nerf/multinerf/train.py", line 222, in main
    checkpoints.save_checkpoint(
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/flax/training/checkpoints.py", line 697, in save_checkpoint
    orbax_checkpointer.save(
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/checkpointer.py", line 186, in save
    self._handler.finalize(tmpdir)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/pytree_checkpoint_handler.py", line 759, in finalize
    self._handler_impl.finalize(directory)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/base_pytree_checkpoint_handler.py", line 1064, in finalize
    type_handlers.merge_ocdbt_per_process_files(directory)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/type_handlers.py", line 635, in merge_ocdbt_per_process_files
    parent_tspec = _get_tensorstore_spec(os.fspath(directory), use_ocdbt=True)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/type_handlers.py", line 745, in _get_tensorstore_spec
    raise ValueError(f'Checkpoint path should be absolute. Got {directory}')
ValueError: Checkpoint path should be absolute. Got data_colmap/checkpoints/checkpoint_1.orbax-checkpoint-tmp-0


@whm5815664
Copy link

got the below error while training the custom data

 Skipping global process sync, barrier name: create_tmp_directory:post.checkpoint_1
I0716 20:30:05.510361 140313332638656 utils.py:219] Skipping global process sync, barrier name: Checkpointer:save.checkpoint_1
Traceback (most recent call last):
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/sdb1/sumant/multi_nerf/multinerf/train.py", line 292, in <module>
    app.run(main)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/mnt/sdb1/sumant/multi_nerf/multinerf/train.py", line 222, in main
    checkpoints.save_checkpoint(
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/flax/training/checkpoints.py", line 697, in save_checkpoint
    orbax_checkpointer.save(
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/checkpointer.py", line 186, in save
    self._handler.finalize(tmpdir)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/pytree_checkpoint_handler.py", line 759, in finalize
    self._handler_impl.finalize(directory)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/base_pytree_checkpoint_handler.py", line 1064, in finalize
    type_handlers.merge_ocdbt_per_process_files(directory)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/type_handlers.py", line 635, in merge_ocdbt_per_process_files
    parent_tspec = _get_tensorstore_spec(os.fspath(directory), use_ocdbt=True)
  File "/mnt/sdb1/sumant/multi_nerf/lib/python3.9/site-packages/orbax/checkpoint/type_handlers.py", line 745, in _get_tensorstore_spec
    raise ValueError(f'Checkpoint path should be absolute. Got {directory}')
ValueError: Checkpoint path should be absolute. Got data_colmap/checkpoints/checkpoint_1.orbax-checkpoint-tmp-0

The path following 'Config.checkpoint_dir' must be an absolute path.
The correct training code should be:
python -m train --gin_configs=configs/llff_256.gin --gin_bindings="Config.data_dir = 'dataset_dir'" --gin_bindings="Config.checkpoint_dir = 'X:/XXX/multinerf-main/dataset_dir/checkpoints'" --logtostderr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants