Inserting QDQ has severely impacted the performance of the unquantized Myelin part. #4297

zsh4614 · 2024-12-23T07:29:10Z

Description

I am performing QAT quantization on a complex model. When I insert Q/DQ nodes into the ResNet portion I want to quantize according to the rules, TensorRT can run this part in INT8 after building. How can I ensure that the parts without Q/DQ nodes run with optimal performance in non-INT8 precision (FP16 + FP32)? I noticed that after inserting Q/DQ nodes into a part of the complex network, the performance of the unquantized parts decreases compared to FP16.

I conducted an experiment where I inserted QDQ only before a single convolution layer and obtained the build result.

The result of building the same network in FP16 mode.

Why does the part within the green box perform differently?

Another question: Even if the input and output of Myelin are exactly the same in the two exported engines, the execution time differs significantly.

fp16 mode:

I'm confused about how I can ensure that the unquantized parts of my model run optimally in FP16 or FP32.

Environment

TensorRT Version: 8.5.2

NVIDIA GPU: orin / 3090

NVIDIA Driver Version:

CUDA Version: 11.4

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting QDQ has severely impacted the performance of the unquantized Myelin part. #4297

Inserting QDQ has severely impacted the performance of the unquantized Myelin part. #4297

zsh4614 commented Dec 23, 2024

Inserting QDQ has severely impacted the performance of the unquantized Myelin part. #4297

Inserting QDQ has severely impacted the performance of the unquantized Myelin part. #4297

Comments

zsh4614 commented Dec 23, 2024

Description

Environment

Relevant Files

Steps To Reproduce