Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygraphy GPU memory leak when processing a large enough number of images #3791

Closed
donrax opened this issue Apr 11, 2024 · 9 comments
Closed
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@donrax
Copy link

donrax commented Apr 11, 2024

Description

Processing a large enough number of images with Polygraphy results in GPU memory leak in method _get_array_on_cpu

Environment

TensorRT Version: 10.0.0.6

NVIDIA GPU: RTX A4500

NVIDIA Driver Version: 551.61

CUDA Version: 11.8

CUDNN Version: 8.9

Operating System: 22.04 LTS

Python Version (if applicable): 3.10

Relevant Files

File:

host_buffers[name] = util.array.resize_or_reallocate(host_buffers[name], shape)

Steps To Reproduce

Simply process a large enough number of images to fill up the GPU memory.

Have you tried the latest release?: The issue presist in the latest release.

Possible solution

Deallocate src data pointer after copying array to cpu at line 129 with:
cuda.wrapper().free(util.array.data_ptr(arr))

@donrax donrax changed the title Polygraphy GPU memory leak when processing multiple images Polygraphy GPU memory leak when processing a large enough number of images Apr 11, 2024
@pranavm-nvidia
Copy link
Collaborator

The device buffers are owned by the output allocator and should be freed when the runner that owns it goes out of scope. Do you have a reproducer for this?

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 12, 2024
@donrax
Copy link
Author

donrax commented Apr 14, 2024

Code snippets that can reproduce the issue

from polygraphy.backend.trt import TrtRunner
from polygraphy.comparator.struct import RunResults, IterationResult

runners = [TrtRunner(engine0), TrtRunner(engine1)]

This code runs fast and produces an Exception:

for r in runners:
    r.activate()
for n, inputs in enumerate(data_loader):
    results = RunResults()
    for r, i in zip(runners, inputs):
        results.append((r.name, IterationResult(outputs=r.infer(i))))
for r in runners:
    r.deactivate()

Exception:

[I] Iteration 1999
trt-runner-N2-04/15/24-06:54:32 | Inference Time: 4.163 ms
trt-runner-N3-04/15/24-06:54:32 | Inference Time: 4.454 ms

[I] Iteration 2000
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.

[W] trt-runner-N3-04/15/24-06:54:32     | Was activated but never deactivated. This could cause a memory leak!
[W] trt-runner-N2-04/15/24-06:54:32     | Was activated but never deactivated. This could cause a memory leak!

This code runs slower and produces an Exception:

for n, inputs in enumerate(data_loader):
    results = RunResults()
    for tmp_r, i in zip(runners, inputs):
        with tmp_r as r:
            results.append((r.name, IterationResult(outputs=r.infer(i))))
[I] Iteration 9999
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 51.933 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 49.425 ms

[I] Iteration 10000
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 56.102 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 51.524 ms

...

[I] Iteration 13000
trt-runner-N2-04/15/24-07:37:40 | Inference Time: 90.265 ms
trt-runner-N3-04/15/24-07:37:40 | Inference Time: 96.287 ms

[I] Iteration 13001
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.


Edit: Exception occurred later on.

@pranavm-nvidia
Copy link
Collaborator

Does it matter what kind of model it is? I tried your code with a simple model with large input shapes but the GPU memory usage is completely stable.

@donrax
Copy link
Author

donrax commented Apr 23, 2024

The model was not at fault, the memory leak occurred specifically while using Polygraphy.

However you are correct, the memory leak in Polygraphy has been fixed. Originally the tested version with the memory leak was Polygraphy 0.49.0, while version 0.49.9 is stable without memory leaks.

@donrax donrax closed this as completed Apr 23, 2024
@donrax donrax reopened this Apr 23, 2024
@donrax
Copy link
Author

donrax commented Apr 23, 2024

After a detailed recheck, we have noticed the following for Polygraphy version 0.49.9:

  • If Polygraphy is installed from the TensorRT cloned repo, there is no memory leak
  • If Polygraphy is installed from The Python Package Index (pypi) a memory leak occurs
  • If Polygraphy is installed using --index-url https://pypi.nvidia.com a memory leak occurs
  • If Polygraphy is installed using --index-url https://pypi.ngc.nvidia.com, there is no memory leak

Q1) Which of the listed package indexes should be used for installing the correct versions of TensorRT (pypi has no version 10.0.0) and the corresponding tools (polygraphy, onnx-graphsurgeon, ...)?

The requirements.txt file in the TensorRT repo specifies onnxruntime==1.12.1; python_version=="3.10" and --extra-index-url https://pypi.ngc.nvidia.com.

Q2) Is the latest TensorRT release only compatible with onnxruntime==1.12.1 and IR 8 (https://onnxruntime.ai/docs/reference/compatibility.html) or is this version of onnxruntime only required for the constant folding operation in onnx-graphsurgeon?

Edit: Clarified questions

@pranavm-nvidia
Copy link
Collaborator

To answer your first question, the regular PyPI is the right place. 10.0 is still in early access so pip won't pick up that version automatically, but you can install it by specifying the version explicitly. See https://pypi.org/project/tensorrt/10.0.0b6/

@ttyio
Copy link
Collaborator

ttyio commented Jul 2, 2024

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

@ttyio ttyio closed this as completed Jul 2, 2024
@michaeldeyzel
Copy link

For those who are still experiencing memory leaks, even with the newest version of Polygraphy, the only version that worked for me was 0.47.1 from NVIDIA's PyPi. It might solve your memory issues:
python -m pip install "polygraphy==0.47.1" --index-url https://pypi.ngc.nvidia.com

@ludekcizinsky
Copy link

I followed @michaeldeyzel's suggestion, and it solved my memory leak issue:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants