-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] [ARM] SVE FP16 functions for MHASingleToken kernel #28182
base: master
Are you sure you want to change the base?
[CPU] [ARM] SVE FP16 functions for MHASingleToken kernel #28182
Conversation
@dmitry-gorokhov you might remember a conversation we had about the inference output not being correct when the new I am not 100% sure what change caused this to get fixed but I will try to find out more soon. |
build_jenkins |
I'm not very sure if the fix I just pushed for https://github.com/openvinotoolkit/openvino/actions/runs/12465345215/job/34792490178?pr=28182#step:15:3723 is correct. Could you please check if it would work? |
build_jenkins |
So we have Android ARM build failures. It happens because Android host uses compiler that actually doesn't support SVE, but we are still trying to cross-compile MHA for SVE isa. |
e7574bd
to
41a4a26
Compare
I've added compiler version checks for clang (>13, source) and gcc (>10, source) in For the other failing pipelines, I remember those being CI issues encountered in #27273 as well. I hope that's still the case. |
@NishantPrabhuFujitsu As I mentioned instead of compiler versions check it is more sufficient to implement condition based on |
@dmitry-gorokhov In that case, I'm not sure I understand what I need to change. For Android platforms, If my understanding is wrong, please help understand how exactly this works and what adjustment to the check would fix it. Thanks! |
Details
In continuation with #27273, adds SVE FP16 implementations for functions called during execution of MHASingleToken for SVE-128, SVE-256 and SVE-512 platforms. SVE implementations are compiled only if runtime support for SVE is detected on the hardware, otherwise it falls back to Neon.
Benchmarking results
Below are the benchmarking results of execution time of each ported function. Measurements were performed by running each function individually on dummy inputs (128 fp16 elements) for 1,000,000 iterations and computing average time (in micro-seconds).