[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

riverlijunjie · 2024-12-20T02:00:39Z

Details:

PR27831 enable MLP fusion in cldnn, it can improve performance, but it is not enabled in MTL 125H due to EU number is 112. So there should be no performance improvement, but PR26940, which integrate dynamic quantization, causes MTL 125H first token performance drop about 8% for 6K input token size. If we enable MLP fusion in MTL 125H, the performance regression will disappear.
test result:

Test cases	First token latency	Commit id
PR27900, before MLP fusion	22297.2 ms	`536bd69`
PR27831, MLP fusion PR but MLP fusion is disabled by EU < 128	22783.2 ms	`bf62609`
PR26940 [GPU] Integrate dynamic quantization for onednn	24395.8 ms	`b840082`
PR26940 + patch to enable MLP on MTL 125H (112 EUs)	22875.3 ms	`b840082`

Tickets:

CVS-159322

…form PR27831 enable MLP fusion in cldnn, it can improve performance, but it is not enabled in MTL 125H due to EU number is 112. So there should be no performance improvement, but PR26940, which integrate dynamic quantization, causes MTL 125H performance drop about 10% for 6K input token size. If we enable MLP fusion in MTL 125H, the performance regression will disappear.

yeonbok · 2024-12-20T11:25:46Z

Hi @riverlijunjie according to @isanghao , PR26940 should not affect MTL because the PR was for onednn case. Could you clarify how PR26940 affected perf?

(And apart from the question, I think we can apply this change though)

riverlijunjie · 2024-12-23T06:48:19Z

Hi @riverlijunjie according to @isanghao , PR26940 should not affect MTL because the PR was for onednn case. Could you clarify how PR26940 affected perf?

(And apart from the question, I think we can apply this change though)

PR26940 updates dynamic_quantize for cldnn, it should take effect on MTL, am i right?

riverlijunjie requested review from a team as code owners December 20, 2024 02:00

github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

riverlijunjie commented Dec 20, 2024 •

edited

Loading

yeonbok commented Dec 20, 2024

riverlijunjie commented Dec 23, 2024

[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

Are you sure you want to change the base?

[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

Conversation

riverlijunjie commented Dec 20, 2024 • edited Loading

Details:

Tickets:

yeonbok commented Dec 20, 2024

riverlijunjie commented Dec 23, 2024

riverlijunjie commented Dec 20, 2024 •

edited

Loading