Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About device size and streaming budget size in TensorRT 10.7 #4285

Open
Jsy0220 opened this issue Dec 16, 2024 · 2 comments
Open

About device size and streaming budget size in TensorRT 10.7 #4285

Jsy0220 opened this issue Dec 16, 2024 · 2 comments

Comments

@Jsy0220
Copy link

Jsy0220 commented Dec 16, 2024

Hi, I upgrade TensorRT from 8.4 to 10.7 and find several new interfaces about size get/set, I have some questions about them.

Following are involved interfaces:

  1. ICudaEngine::getDeviceMemorySizeV2
  2. ICudaEngine::getStreamableWeightsSize
  3. ICudaEngine::setWeightStreamingBudgetV2
  4. ICudaEngine::getWeightStreamingScratchMemorySize
  5. IExecutionContext::updateDeviceMemorySizeForShapes
  6. IExecutionContext::setDeviceMemoryV2

I tried to understand them according to comments but not for sure. Suppose kUSER_MANAGED is used for context creation and open weight streaming.

From 2 can find total weight size and user can calculate the budget size according to it and set through 3. And What does result of 4 do ? just a info ?

And for 1

  1. Does it should be called after 3?
  2. I think the return of it is the upper limit of total size, so 6 only need to set once using this return and there is no need to call 5 any more. is it right ? Otherwise, 6 should be called to set new buffer according to 5 return value every time new shapes have been set.
@lix19937
Copy link

First, need understand ScratchMemory (Mainly used for temporary storage required by layer implementations ).

DeviceMemory is mainly used for intermediate activation tensors, also used for temporary storage required by layer implementations, so inculde ScratchMemory.

Assume weight streaming budget as a WSB_op. So k streamable weight, means k WSB_op.

getDeviceMemorySizeV2() = sum { ScratchMemory }  
 
getStreamableWeightsSize() = sum { streamable weight }  

getWeightStreamingScratchMemorySize() = size of the scratch memory required by the current WSB op.  

If getDeviceMemorySizeV2() is called before enabling weight streaming by setWeightStreamingBudgetV2(), the return value will not include the extra scratch memory size required by weight streaming, which can be obtained using getWeightStreamingScratchMemorySize(). Otherwise, it will include this extra memory.

If the budget set by setWeightStreamingBudgetV2() is larger than the total size of streamable weights obtained by getStreamableWeightsSize(), the budget will be clipped to the total size, effectively disabling weight streaming.

You can query the budget set by getWeightStreamingBudgetV2().

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_execution_context.html#ad79a22f270d281f8550e48f2f43077a6

@Jsy0220
Copy link
Author

Jsy0220 commented Dec 18, 2024

@lix19937 Okay, Thank you !!
So setWeightStreamingBudgetV2() is set for persistent memory managed by TensorRT while setDeviceMemoryV2() is set for temporary memory which can be managed by users. Is that right ?
What about updateDeviceMemorySizeForShapes ? Is it adjust persistent memory and return a new temporary memory according to input shapes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants