Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

不同个数的输出导致运行结果不一致 #4284

Open
Rudin6 opened this issue Dec 15, 2024 · 4 comments
Open

不同个数的输出导致运行结果不一致 #4284

Rudin6 opened this issue Dec 15, 2024 · 4 comments

Comments

@Rudin6
Copy link

Rudin6 commented Dec 15, 2024

Description

(部分linear attetion代码)
hidden_states = query_KV / query_Z
return hidden_states, query_KV, query_Z

hidden_states = query_KV / query_Z
return hidden_states

上面两者方式onnx转tensorrt时,两者结果不一样,前者是正确的,后者会出现nan;
这相当于输出了中间状态会导致结果的正确性,该怎么解决这种问题哇?

后面测试:tensorrt 10.7会出现这种问题,10.6是正确的

@lix19937
Copy link

Maybe the fusion methods(tactic choice) are different.

@Rudin6
Copy link
Author

Rudin6 commented Dec 16, 2024

How to solve it?

@lix19937
Copy link

You can compare the build logs of two onnx by trtexec --verbose --layerProfile.

@Rudin6
Copy link
Author

Rudin6 commented Dec 18, 2024

Thanks! There is one other problem in Tensorrt 10.7.0 : when building the trt engine, we set the parameter value of config:
logger = trt.Logger(trt.Logger.VERBOSE)
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
config.set_flag(trt.BuilderFlag.BF16)
config.set_flag(trt.BuilderFlag.TF32)
During the process of building the engine, only f32f32 gemm or tf32tf32 gemm was discovered, no bf16 gemm, resulting in bf16 is slower than fp16.
How to solve it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants