Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile_image_encoder: true causes cudagraph overwriting crash #460

Closed
tonydavis629 opened this issue Nov 21, 2024 · 3 comments
Closed

compile_image_encoder: true causes cudagraph overwriting crash #460

tonydavis629 opened this issue Nov 21, 2024 · 3 comments

Comments

@tonydavis629
Copy link

tonydavis629 commented Nov 21, 2024

  File "/home/tony/sam_label.py", line 54, in generate_tracks_sam2
    _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 283, in add_new_points_or_box
    current_out, _ = self._run_single_frame_inference(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 933, in _run_single_frame_inference
    ) = self._get_image_feature(inference_state, frame_idx, batch_size)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 889, in _get_image_feature
    backbone_out = self.forward_image(image)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/modeling/sam2_base.py", line 469, in forward_image
    backbone_out = self.image_encoder(img_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 29, in forward
    def forward(self, sample: torch.Tensor):
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1100, in forward
    return compiled_fn(full_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 321, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/utils.py", line 124, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
                            ^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 667, in inner_fn
    outs = compiled_fn(args)
           ^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 488, in wrapper
    return compiled_fn(runtime_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1478, in __call__
    return self.current_callable(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1008, in run
    return compiled_fn(new_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 398, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 428, in cudagraphify
    return manager.add_function(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2213, in add_function
    return fn, fn(inputs)
               ^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 1919, in run
    out = self._run(new_inputs, function_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2023, in _run
    return self.run_eager(new_inputs, function_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2179, in run_eager
    return node.run(new_inputs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 633, in run
    non_cudagraph_inps_storages = get_non_cudagraph_inps()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 628, in get_non_cudagraph_inps
    and t.untyped_storage().data_ptr() not in existing_path_data_ptrs
        ^^^^^^^^^^^^^^^^^^^
RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 31, in forward
    features, pos = self.neck(self.trunk(sample))
  File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 132, in forward
    pos[i] = self.position_encoding(x_out).to(x_out.dtype)
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/tony/models/sam2/sam2/modeling/position_encoding.py", line 111, in forward
    self.cache[cache_key] = pos[0]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.

I have tried adding torch.compiler.cudagraph_mark_step_begin() before each predictor call, no change, see pytorch/pytorch#141171.

I'm following the video prediction notebook and using box prompts. Installed using [notebooks] setup.

out_frame_idx, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    frame_idx=frame_idx,
    points=None,
    box=boxes,
    inference_state=inference_state,
    obj_id=track_id
)
            
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
    video_segments[out_frame_idx] = {
        out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
        for i, out_obj_id in enumerate(out_obj_ids)
    }
@chayryali
Copy link
Contributor

Hi @tonydavis629, This happens because of changes to torch.compile in more recent torch versions. This PR should have fixed it. Can you confirm?

@tonydavis629
Copy link
Author

No, I get the same error. See this issue: #501 (comment)

But it seems it might be solved, I will confirm on that issue

@chayryali
Copy link
Contributor

Resolved by a recent PR and confirmed to be resolved on issue 501.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants