-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polygraphy GPU memory leak when processing a large enough number of images #3791
Comments
The device buffers are owned by the output allocator and should be freed when the runner that owns it goes out of scope. Do you have a reproducer for this? |
Code snippets that can reproduce the issue from polygraphy.backend.trt import TrtRunner
from polygraphy.comparator.struct import RunResults, IterationResult
runners = [TrtRunner(engine0), TrtRunner(engine1)] This code runs fast and produces an Exception: for r in runners:
r.activate()
for n, inputs in enumerate(data_loader):
results = RunResults()
for r, i in zip(runners, inputs):
results.append((r.name, IterationResult(outputs=r.infer(i))))
for r in runners:
r.deactivate() Exception: [I] Iteration 1999
trt-runner-N2-04/15/24-06:54:32 | Inference Time: 4.163 ms
trt-runner-N3-04/15/24-06:54:32 | Inference Time: 4.454 ms
[I] Iteration 2000
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.
[W] trt-runner-N3-04/15/24-06:54:32 | Was activated but never deactivated. This could cause a memory leak!
[W] trt-runner-N2-04/15/24-06:54:32 | Was activated but never deactivated. This could cause a memory leak! This code runs slower and produces an Exception: for n, inputs in enumerate(data_loader):
results = RunResults()
for tmp_r, i in zip(runners, inputs):
with tmp_r as r:
results.append((r.name, IterationResult(outputs=r.infer(i)))) [I] Iteration 9999
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 51.933 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 49.425 ms
[I] Iteration 10000
trt-runner-N0-04/15/24-07:07:26 | Inference Time: 56.102 ms
trt-runner-N1-04/15/24-07:07:26 | Inference Time: 51.524 ms
...
[I] Iteration 13000
trt-runner-N2-04/15/24-07:37:40 | Inference Time: 90.265 ms
trt-runner-N3-04/15/24-07:37:40 | Inference Time: 96.287 ms
[I] Iteration 13001
[!] CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[ERROR] Exception caught in reallocateOutput(): PolygraphyException: CUDA Error: 2. To figure out what this means, refer to https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038
[E] 2: [executionContext.cpp::invokeReallocateOutput::206] Error Code 2: Internal Error (IOutputAllocator returned nullptr for allocation request of 1231360 bytes.)
[!] `execute_async_v3()` failed. Please see the logging output above for details.
Edit: Exception occurred later on. |
Does it matter what kind of model it is? I tried your code with a simple model with large input shapes but the GPU memory usage is completely stable. |
The model was not at fault, the memory leak occurred specifically while using Polygraphy. However you are correct, the memory leak in Polygraphy has been fixed. Originally the tested version with the memory leak was Polygraphy 0.49.0, while version 0.49.9 is stable without memory leaks. |
After a detailed recheck, we have noticed the following for Polygraphy version 0.49.9:
Q1) Which of the listed package indexes should be used for installing the correct versions of TensorRT (pypi has no version 10.0.0) and the corresponding tools (polygraphy, onnx-graphsurgeon, ...)? The Q2) Is the latest TensorRT release only compatible with Edit: Clarified questions |
To answer your first question, the regular PyPI is the right place. 10.0 is still in early access so |
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all! |
For those who are still experiencing memory leaks, even with the newest version of Polygraphy, the only version that worked for me was 0.47.1 from NVIDIA's PyPi. It might solve your memory issues: |
I followed @michaeldeyzel's suggestion, and it solved my memory leak issue:) |
Description
Processing a large enough number of images with Polygraphy results in GPU memory leak in method
_get_array_on_cpu
Environment
TensorRT Version: 10.0.0.6
NVIDIA GPU: RTX A4500
NVIDIA Driver Version: 551.61
CUDA Version: 11.8
CUDNN Version: 8.9
Operating System: 22.04 LTS
Python Version (if applicable): 3.10
Relevant Files
File:
TensorRT/tools/Polygraphy/polygraphy/backend/trt/runner.py
Line 121 in 5eeb6c7
Steps To Reproduce
Simply process a large enough number of images to fill up the GPU memory.
Have you tried the latest release?: The issue presist in the latest release.
Possible solution
Deallocate
src
data pointer after copying array to cpu at line 129 with:cuda.wrapper().free(util.array.data_ptr(arr))
The text was updated successfully, but these errors were encountered: