compile_image_encoder: true causes cudagraph overwriting crash #460

tonydavis629 · 2024-11-21T16:42:37Z

  File "/home/tony/sam_label.py", line 54, in generate_tracks_sam2
    _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 283, in add_new_points_or_box
    current_out, _ = self._run_single_frame_inference(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 933, in _run_single_frame_inference
    ) = self._get_image_feature(inference_state, frame_idx, batch_size)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/sam2_video_predictor.py", line 889, in _get_image_feature
    backbone_out = self.forward_image(image)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/modeling/sam2_base.py", line 469, in forward_image
    backbone_out = self.image_encoder(img_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 29, in forward
    def forward(self, sample: torch.Tensor):
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1100, in forward
    return compiled_fn(full_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 321, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/utils.py", line 124, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
                            ^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 667, in inner_fn
    outs = compiled_fn(args)
           ^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 488, in wrapper
    return compiled_fn(runtime_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1478, in __call__
    return self.current_callable(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1008, in run
    return compiled_fn(new_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 398, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 428, in cudagraphify
    return manager.add_function(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2213, in add_function
    return fn, fn(inputs)
               ^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 1919, in run
    out = self._run(new_inputs, function_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2023, in _run
    return self.run_eager(new_inputs, function_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 2179, in run_eager
    return node.run(new_inputs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 633, in run
    non_cudagraph_inps_storages = get_non_cudagraph_inps()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/_inductor/cudagraph_trees.py", line 628, in get_non_cudagraph_inps
    and t.untyped_storage().data_ptr() not in existing_path_data_ptrs
        ^^^^^^^^^^^^^^^^^^^
RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 31, in forward
    features, pos = self.neck(self.trunk(sample))
  File "/home/tony/models/sam2/sam2/modeling/backbones/image_encoder.py", line 132, in forward
    pos[i] = self.position_encoding(x_out).to(x_out.dtype)
  File "/home/tony/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/tony/models/sam2/sam2/modeling/position_encoding.py", line 111, in forward
    self.cache[cache_key] = pos[0]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.

I have tried adding torch.compiler.cudagraph_mark_step_begin() before each predictor call, no change, see pytorch/pytorch#141171.

I'm following the video prediction notebook and using box prompts. Installed using [notebooks] setup.

out_frame_idx, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    frame_idx=frame_idx,
    points=None,
    box=boxes,
    inference_state=inference_state,
    obj_id=track_id
)
            
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
    video_segments[out_frame_idx] = {
        out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
        for i, out_obj_id in enumerate(out_obj_ids)
    }

The text was updated successfully, but these errors were encountered:

chayryali · 2024-12-19T09:14:26Z

Hi @tonydavis629, This happens because of changes to torch.compile in more recent torch versions. This PR should have fixed it. Can you confirm?

tonydavis629 · 2024-12-19T11:21:28Z

No, I get the same error. See this issue: #501 (comment)

But it seems it might be solved, I will confirm on that issue

chayryali · 2024-12-23T21:52:29Z

Resolved by a recent PR and confirmed to be resolved on issue 501.

chayryali closed this as completed Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile_image_encoder: true causes cudagraph overwriting crash #460

compile_image_encoder: true causes cudagraph overwriting crash #460

tonydavis629 commented Nov 21, 2024 •

edited

Loading

chayryali commented Dec 19, 2024

tonydavis629 commented Dec 19, 2024

chayryali commented Dec 23, 2024

compile_image_encoder: true causes cudagraph overwriting crash #460

compile_image_encoder: true causes cudagraph overwriting crash #460

Comments

tonydavis629 commented Nov 21, 2024 • edited Loading

chayryali commented Dec 19, 2024

tonydavis629 commented Dec 19, 2024

chayryali commented Dec 23, 2024

tonydavis629 commented Nov 21, 2024 •

edited

Loading