About device size and streaming budget size in TensorRT 10.7 #4285

Jsy0220 · 2024-12-16T09:20:58Z

Hi, I upgrade TensorRT from 8.4 to 10.7 and find several new interfaces about size get/set, I have some questions about them.

Following are involved interfaces:

ICudaEngine::getDeviceMemorySizeV2
ICudaEngine::getStreamableWeightsSize
ICudaEngine::setWeightStreamingBudgetV2
ICudaEngine::getWeightStreamingScratchMemorySize
IExecutionContext::updateDeviceMemorySizeForShapes
IExecutionContext::setDeviceMemoryV2

I tried to understand them according to comments but not for sure. Suppose kUSER_MANAGED is used for context creation and open weight streaming.

From 2 can find total weight size and user can calculate the budget size according to it and set through 3. And What does result of 4 do ? just a info ?

And for 1

Does it should be called after 3?
I think the return of it is the upper limit of total size, so 6 only need to set once using this return and there is no need to call 5 any more. is it right ? Otherwise, 6 should be called to set new buffer according to 5 return value every time new shapes have been set.

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-12-18T02:03:36Z

First, need understand ScratchMemory (Mainly used for temporary storage required by layer implementations ).

DeviceMemory is mainly used for intermediate activation tensors, also used for temporary storage required by layer implementations, so inculde ScratchMemory.

Assume weight streaming budget as a WSB_op. So k streamable weight, means k WSB_op.

getDeviceMemorySizeV2() = sum { ScratchMemory }  
 
getStreamableWeightsSize() = sum { streamable weight }  

getWeightStreamingScratchMemorySize() = size of the scratch memory required by the current WSB op.

If getDeviceMemorySizeV2() is called before enabling weight streaming by setWeightStreamingBudgetV2(), the return value will not include the extra scratch memory size required by weight streaming, which can be obtained using getWeightStreamingScratchMemorySize(). Otherwise, it will include this extra memory.

If the budget set by setWeightStreamingBudgetV2() is larger than the total size of streamable weights obtained by getStreamableWeightsSize(), the budget will be clipped to the total size, effectively disabling weight streaming.

You can query the budget set by getWeightStreamingBudgetV2().

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_execution_context.html#ad79a22f270d281f8550e48f2f43077a6

Jsy0220 · 2024-12-18T02:25:44Z

@lix19937 Okay, Thank you !!
So setWeightStreamingBudgetV2() is set for persistent memory managed by TensorRT while setDeviceMemoryV2() is set for temporary memory which can be managed by users. Is that right ?
What about updateDeviceMemorySizeForShapes ? Is it adjust persistent memory and return a new temporary memory according to input shapes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About device size and streaming budget size in TensorRT 10.7 #4285

About device size and streaming budget size in TensorRT 10.7 #4285

Jsy0220 commented Dec 16, 2024 •

edited

Loading

lix19937 commented Dec 18, 2024

Jsy0220 commented Dec 18, 2024

About device size and streaming budget size in TensorRT 10.7 #4285

About device size and streaming budget size in TensorRT 10.7 #4285

Comments

Jsy0220 commented Dec 16, 2024 • edited Loading

lix19937 commented Dec 18, 2024

Jsy0220 commented Dec 18, 2024

Jsy0220 commented Dec 16, 2024 •

edited

Loading