Polygraphy GPU memory leak when processing a large enough number of images #3791

donrax · 2024-04-11T07:23:05Z

Description

Processing a large enough number of images with Polygraphy results in GPU memory leak in method _get_array_on_cpu

Environment

TensorRT Version: 10.0.0.6

NVIDIA GPU: RTX A4500

NVIDIA Driver Version: 551.61

CUDA Version: 11.8

CUDNN Version: 8.9

Operating System: 22.04 LTS

Python Version (if applicable): 3.10

Relevant Files

File:

TensorRT/tools/Polygraphy/polygraphy/backend/trt/runner.py

Line 121 in 5eeb6c7

host_buffers[name] = util.array.resize_or_reallocate(host_buffers[name], shape)

Steps To Reproduce

Simply process a large enough number of images to fill up the GPU memory.

Have you tried the latest release?: The issue presist in the latest release.

Possible solution

Deallocate src data pointer after copying array to cpu at line 129 with:
cuda.wrapper().free(util.array.data_ptr(arr))

The text was updated successfully, but these errors were encountered:

pranavm-nvidia · 2024-04-11T17:18:27Z

The device buffers are owned by the output allocator and should be freed when the runner that owns it goes out of scope. Do you have a reproducer for this?

donrax · 2024-04-14T18:57:45Z

Code snippets that can reproduce the issue

from polygraphy.backend.trt import TrtRunner
from polygraphy.comparator.struct import RunResults, IterationResult

runners = [TrtRunner(engine0), TrtRunner(engine1)]

This code runs fast and produces an Exception:

for r in runners:
    r.activate()
for n, inputs in enumerate(data_loader):
    results = RunResults()
    for r, i in zip(runners, inputs):
        results.append((r.name, IterationResult(outputs=r.infer(i))))
for r in runners:
    r.deactivate()

Exception:

[I] Iteration 1999
trt-runner-N2-04/15/24-06:54:32 | Inference Time: 4.163 ms
trt-runner-N3-04/15/24-06:54:32 | Inference Time: 4.454 ms

[I] Iteration 2000
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.

[W] trt-runner-N3-04/15/24-06:54:32     | Was activated but never deactivated. This could cause a memory leak!
[W] trt-runner-N2-04/15/24-06:54:32     | Was activated but never deactivated. This could cause a memory leak!

This code runs slower and produces an Exception:

for n, inputs in enumerate(data_loader):
    results = RunResults()
    for tmp_r, i in zip(runners, inputs):
        with tmp_r as r:
            results.append((r.name, IterationResult(outputs=r.infer(i))))

[I] Iteration 9999
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 51.933 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 49.425 ms

[I] Iteration 10000
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 56.102 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 51.524 ms

...

[I] Iteration 13000
trt-runner-N2-04/15/24-07:37:40 | Inference Time: 90.265 ms
trt-runner-N3-04/15/24-07:37:40 | Inference Time: 96.287 ms

[I] Iteration 13001
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.

Edit: Exception occurred later on.

pranavm-nvidia · 2024-04-15T16:54:01Z

Does it matter what kind of model it is? I tried your code with a simple model with large input shapes but the GPU memory usage is completely stable.

donrax · 2024-04-23T08:24:52Z

The model was not at fault, the memory leak occurred specifically while using Polygraphy.

However you are correct, the memory leak in Polygraphy has been fixed. Originally the tested version with the memory leak was Polygraphy 0.49.0, while version 0.49.9 is stable without memory leaks.

donrax · 2024-04-23T20:37:33Z

After a detailed recheck, we have noticed the following for Polygraphy version 0.49.9:

If Polygraphy is installed from the TensorRT cloned repo, there is no memory leak
If Polygraphy is installed from The Python Package Index (pypi) a memory leak occurs
If Polygraphy is installed using --index-url https://pypi.nvidia.com a memory leak occurs
If Polygraphy is installed using --index-url https://pypi.ngc.nvidia.com, there is no memory leak

Q1) Which of the listed package indexes should be used for installing the correct versions of TensorRT (pypi has no version 10.0.0) and the corresponding tools (polygraphy, onnx-graphsurgeon, ...)?

The requirements.txt file in the TensorRT repo specifies onnxruntime==1.12.1; python_version=="3.10" and --extra-index-url https://pypi.ngc.nvidia.com.

Q2) Is the latest TensorRT release only compatible with onnxruntime==1.12.1 and IR 8 (https://onnxruntime.ai/docs/reference/compatibility.html) or is this version of onnxruntime only required for the constant folding operation in onnx-graphsurgeon?

Edit: Clarified questions

pranavm-nvidia · 2024-04-24T17:35:07Z

To answer your first question, the regular PyPI is the right place. 10.0 is still in early access so pip won't pick up that version automatically, but you can install it by specifying the version explicitly. See https://pypi.org/project/tensorrt/10.0.0b6/

ttyio · 2024-07-02T16:55:53Z

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

michaeldeyzel · 2024-12-11T09:17:54Z

For those who are still experiencing memory leaks, even with the newest version of Polygraphy, the only version that worked for me was 0.47.1 from NVIDIA's PyPi. It might solve your memory issues:
python -m pip install "polygraphy==0.47.1" --index-url https://pypi.ngc.nvidia.com

ludekcizinsky · 2024-12-13T12:00:44Z

I followed @michaeldeyzel's suggestion, and it solved my memory leak issue:)

donrax changed the title ~~Polygraphy GPU memory leak when processing multiple images~~ Polygraphy GPU memory leak when processing a large enough number of images Apr 11, 2024

zerollzeng assigned pranavm-nvidia Apr 12, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 12, 2024

donrax closed this as completed Apr 23, 2024

donrax reopened this Apr 23, 2024

ttyio closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polygraphy GPU memory leak when processing a large enough number of images #3791

Polygraphy GPU memory leak when processing a large enough number of images #3791

donrax commented Apr 11, 2024

pranavm-nvidia commented Apr 11, 2024

donrax commented Apr 14, 2024 •

edited

Loading

pranavm-nvidia commented Apr 15, 2024

donrax commented Apr 23, 2024

donrax commented Apr 23, 2024 •

edited

Loading

pranavm-nvidia commented Apr 24, 2024

ttyio commented Jul 2, 2024

michaeldeyzel commented Dec 11, 2024

ludekcizinsky commented Dec 13, 2024

Polygraphy GPU memory leak when processing a large enough number of images #3791

Polygraphy GPU memory leak when processing a large enough number of images #3791

Comments

donrax commented Apr 11, 2024

Description

Environment

Relevant Files

Steps To Reproduce

Possible solution

pranavm-nvidia commented Apr 11, 2024

donrax commented Apr 14, 2024 • edited Loading

pranavm-nvidia commented Apr 15, 2024

donrax commented Apr 23, 2024

donrax commented Apr 23, 2024 • edited Loading

pranavm-nvidia commented Apr 24, 2024

ttyio commented Jul 2, 2024

michaeldeyzel commented Dec 11, 2024

ludekcizinsky commented Dec 13, 2024

donrax commented Apr 14, 2024 •

edited

Loading

donrax commented Apr 23, 2024 •

edited

Loading