[Feature request] allow uint8 output without an ICastLayer before #4282
Labels
Enhancement
New feature or request
quantization
Issues related to Quantization
triaged
Issue has been triaged by maintainers
Context
I work in the broadcast sector, where frames processed by our TensorRT (TRT) engines can have various pixel formats (encoded or decoded, bit depths, color spaces, etc).
We developed a custom codec Plugin that converts all those pixel formats to and from
fp16
/fp32
, enabling TRT to process these frames. This codec layer accepts a format input that specifies the pixel format of the input, allowing the plugin to determine the appropriate codec for conversion. This codec layer is inserted at the beginning and the end of our TRT engines.Implementing this plugin with
uint8
inputs and outputs simplifies the design and results in a cleaner implementation.Problem
During the network-building stage, the following error is encountered:
Request
Provide a mechanism to allow my custom plugin to produce uint8 network outputs. There is no need for a casting layer here.
Dirty workaround
Our current workaround involves bypassing TRT's restrictions by misrepresenting the
uint8
byte array as anfp16
array with half the number of elements. While this approach allows the engine to build, it is not ideal:We serve the TRT engine via Triton using the TensorRT backend. The TRT datatype determines the datatype specified in the config.pbtxt. This datatype propagates to the Triton client, leading to potential discrepancies or confusion. A proper solution would remove the need for this workaround and ensure clean, consistent datatype handling.
The text was updated successfully, but these errors were encountered: