不同个数的输出导致运行结果不一致 #4284

Rudin6 · 2024-12-15T07:16:04Z

Description

（部分linear attetion代码）
hidden_states = query_KV / query_Z
return hidden_states, query_KV, query_Z

和

hidden_states = query_KV / query_Z
return hidden_states

上面两者方式onnx转tensorrt时，两者结果不一样，前者是正确的，后者会出现nan;
这相当于输出了中间状态会导致结果的正确性，该怎么解决这种问题哇？

后面测试：tensorrt 10.7会出现这种问题，10.6是正确的

lix19937 · 2024-12-16T05:42:56Z

Maybe the fusion methods(tactic choice) are different.

Rudin6 · 2024-12-16T07:17:44Z

How to solve it?

lix19937 · 2024-12-16T09:44:12Z

You can compare the build logs of two onnx by trtexec --verbose --layerProfile.

Rudin6 · 2024-12-18T01:55:08Z

Thanks! There is one other problem in Tensorrt 10.7.0 : when building the trt engine, we set the parameter value of config:
logger = trt.Logger(trt.Logger.VERBOSE)
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
config.set_flag(trt.BuilderFlag.BF16)
config.set_flag(trt.BuilderFlag.TF32)
During the process of building the engine, only f32f32 gemm or tf32tf32 gemm was discovered, no bf16 gemm， resulting in bf16 is slower than fp16.
How to solve it ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

不同个数的输出导致运行结果不一致 #4284

不同个数的输出导致运行结果不一致 #4284

Rudin6 commented Dec 15, 2024 •

edited

Loading

lix19937 commented Dec 16, 2024

Rudin6 commented Dec 16, 2024

lix19937 commented Dec 16, 2024

Rudin6 commented Dec 18, 2024 •

edited

Loading

不同个数的输出导致运行结果不一致 #4284

不同个数的输出导致运行结果不一致 #4284

Comments

Rudin6 commented Dec 15, 2024 • edited Loading

Description

lix19937 commented Dec 16, 2024

Rudin6 commented Dec 16, 2024

lix19937 commented Dec 16, 2024

Rudin6 commented Dec 18, 2024 • edited Loading

Rudin6 commented Dec 15, 2024 •

edited

Loading

Rudin6 commented Dec 18, 2024 •

edited

Loading