Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forcing layernorm layers to run in FP32 precision #2781

Closed
de1star opened this issue Mar 17, 2023 · 12 comments
Closed

Forcing layernorm layers to run in FP32 precision #2781

de1star opened this issue Mar 17, 2023 · 12 comments
Labels
Accuracy Output mismatch between TensorRT and other frameworks triaged Issue has been triaged by maintainers

Comments

@de1star
Copy link

de1star commented Mar 17, 2023

Hi, when I build tensorRT engine, there was a warning:
[W] Running layernorm after self-attention in FP16 may cause overflow. Forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
But I did not found an approach to force layernorm run in fp32 precision, could you help me with that? Thanks a lot!

@rajeevsrao
Copy link
Collaborator

@de1star is this an ONNX model you are trying to run? If so can you try exporting to opset 17 (which added LayerNormalization operator) and running with TRT 8.6? Precision requirements for LayerNormalization operator is handled automatically by TensorRT optimizer in 8.6.

@rajeevsrao rajeevsrao added triaged Issue has been triaged by maintainers Accuracy Output mismatch between TensorRT and other frameworks labels Mar 18, 2023
@de1star
Copy link
Author

de1star commented Mar 20, 2023

Thank you, I will have try!

@de1star
Copy link
Author

de1star commented Mar 20, 2023

Hi, I tried to set opset_version as 17, but an error was raised:
ValueError: Unsupported ONNX opset version: 17.
It seems like torch does not support opset_version=17. Any suggestions?

@rajeevsrao
Copy link
Collaborator

@de1star you will need to use torch v1.13.0 or newer version.
https://github.com/pytorch/pytorch/blob/v1.13.0-rc1/torch/onnx/symbolic_opset17.py

@ttyio
Copy link
Collaborator

ttyio commented Apr 24, 2023

Closing since no activity for more than 3 weeks, pls reopen if you still have question. thanks!

@ttyio ttyio closed this as completed Apr 24, 2023
@monsterlyg
Copy link

monsterlyg commented Nov 6, 2023

ad32cfe3b23288fb6f1ac1f0f I've exported model to opset 17 onnx. The warning still exists. @rajeevsrao

@jinluyang
Copy link

jinluyang commented Nov 8, 2023

Sorry, my bad. I found TensorRT8.6 to be working fine. I got the following error because I had previously a TensorRT8.4 installed, I missed the libnvonnxparser.so while removing TRT8.4. now with TRT8.6 and ONNX opset17, everything works fine, thank you.

image

Also the way to manually set the layernorm layer to fp32 through tensorrt python api, can be figured out by this link #1196 (comment)

@1193700079
Copy link

How can TensorRT be implemented in C++?

@w1005444804
Copy link

@rajeevsrao How to Forcing layernorm layers to run in FP32 precision with c++?? I have set "config->setFlag(BuilderFlag::kFP16);"

@focusunsink
Copy link

still no solution

@lantudou
Copy link

lantudou commented Nov 23, 2024

trtexec --onnx=sim_cnn.onnx --saveEngine=model.trt --fp16 --verbose
you will find something like that in output:

[11/23/2024-18:16:34] [W] [TRT] Detected layernorm nodes in FP16.
[11/23/2024-18:16:34] [V] [TRT] /downsample_layers.0/downsample_layers.0.1/ReduceMean_1,/downsample_layers.0/downsample_layers.0.1/ReduceMean,/downsample_layers.0/downsample_layers.0.1/Pow,/downsample_layers.1/downsample_layers.1.0/ReduceMean,/downsample_layers.1/downsample_layers.1.0/Pow,/downsample_layers.1/downsample_layers.1.0/ReduceMean_1,/downsample_layers.2/downsample_layers.2.0/ReduceMean,/downsample_layers.2/downsample_layers.2.0/Pow,/downsample_layers.2/downsample_layers.2.0/ReduceMean_1,/downsample_layers.3/downsample_layers.3.0/ReduceMean,/downsample_layers.3/downsample_layers.3.0/Pow,/downsample_layers.3/downsample_layers.3.0/ReduceMean_1
[11/23/2024-18:16:34] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.

Copy this layer name,and set them to fp32 percision like this command:

trtexec --onnx=sim_cnn.onnx --saveEngine=model.trt --fp16 --precisionConstraints=obey --layerPrecisions=/downsample_layers.0/downsample_layers.0.1/ReduceMean_1:fp32,/downsample_layers.0/downsample_layers.0.1/ReduceMean:fp32,/downsample_layers.0/downsample_layers.0.1/Pow:fp32,/downsample_layers.1/downsample_layers.1.0/ReduceMean:fp32,/downsample_layers.1/downsample_layers.1.0/Pow:fp32,/downsample_layers.1/downsample_layers.1.0/ReduceMean_1:fp32,/downsample_layers.2/downsample_layers.2.0/ReduceMean:fp32,/downsample_layers.2/downsample_layers.2.0/Pow:fp32,/downsample_layers.2/downsample_layers.2.0/ReduceMean_1:fp32,/downsample_layers.3/downsample_layers.3.0/ReduceMean:fp32,/downsample_layers.3/downsample_layers.3.0/Pow:fp32,/downsample_layers.3/downsample_layers.3.0/ReduceMean_1:fp32

@focusunsink
Copy link

work
if no, info me

if self.trt_model_dtype == "fp16":
            # pass
            config.set_flag(trt.BuilderFlag.FP16)
            print("network.num_layers", network.num_layers)
            for i in range(network.num_layers):
                tmp_layer = network.get_layer(i)
                if "norm" in tmp_layer.name and "Cast_1" not in tmp_layer.name and "Mul_1" not in tmp_layer.name : #== "/ar_decoder/norm/Mul":
                    print("setting ", tmp_layer.name, " precision")
                    network.get_layer(i).precision = trt.DataType.FLOAT
                    network.get_layer(i).set_output_type(0, trt.DataType.FLOAT)
                print(tmp_layer.name, tmp_layer.type, tmp_layer.num_inputs, tmp_layer.precision, tmp_layer.precision_is_set)
            config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
            # config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accuracy Output mismatch between TensorRT and other frameworks triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

9 participants