-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forcing layernorm layers to run in FP32 precision #2781
Comments
@de1star is this an ONNX model you are trying to run? If so can you try exporting to opset 17 (which added |
Thank you, I will have try! |
Hi, I tried to set opset_version as 17, but an error was raised: |
@de1star you will need to use torch v1.13.0 or newer version. |
Closing since no activity for more than 3 weeks, pls reopen if you still have question. thanks! |
I've exported model to opset 17 onnx. The warning still exists. @rajeevsrao |
Sorry, my bad. I found TensorRT8.6 to be working fine. I got the following error because I had previously a TensorRT8.4 installed, I missed the libnvonnxparser.so while removing TRT8.4. now with TRT8.6 and ONNX opset17, everything works fine, thank you. Also the way to manually set the layernorm layer to fp32 through tensorrt python api, can be figured out by this link #1196 (comment) |
How can TensorRT be implemented in C++? |
@rajeevsrao How to Forcing layernorm layers to run in FP32 precision with c++?? I have set "config->setFlag(BuilderFlag::kFP16);" |
still no solution |
Copy this layer name,and set them to fp32 percision like this command:
|
work if self.trt_model_dtype == "fp16":
# pass
config.set_flag(trt.BuilderFlag.FP16)
print("network.num_layers", network.num_layers)
for i in range(network.num_layers):
tmp_layer = network.get_layer(i)
if "norm" in tmp_layer.name and "Cast_1" not in tmp_layer.name and "Mul_1" not in tmp_layer.name : #== "/ar_decoder/norm/Mul":
print("setting ", tmp_layer.name, " precision")
network.get_layer(i).precision = trt.DataType.FLOAT
network.get_layer(i).set_output_type(0, trt.DataType.FLOAT)
print(tmp_layer.name, tmp_layer.type, tmp_layer.num_inputs, tmp_layer.precision, tmp_layer.precision_is_set)
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
# config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS) |
Hi, when I build tensorRT engine, there was a warning:
[W] Running layernorm after self-attention in FP16 may cause overflow. Forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
But I did not found an approach to force layernorm run in fp32 precision, could you help me with that? Thanks a lot!
The text was updated successfully, but these errors were encountered: