Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Support dynamic activation sparsity #27974

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

usstq
Copy link
Contributor

@usstq usstq commented Dec 9, 2024

Details:

Activation sparsity exploit the fact that activations in MLP of LLMs is sparse and input channels of activations with small magnitude can be set as zero with acceptable accuracy-drop.

The distribution of sparse channels of activation is dynamic (only known at runtime) and variates a lot from token to token, thus the optimization opportunity only exists in 2nd token generation process with batch-size fixed to 1 (which is exactly typical use-case for client-side LLM inference), in which case weight memory reading cost corresponding to the skipped input channel can be saved.

The best weight memory layout for this optimization is plain [IC, OC], so weights corresponding to each input channel is dense, the non-sparse input channel can enjoy CPU's HW-prefetcher's boost to continuous stream access. if we use current blocked weight-layout set by oneDNN-fork, the weights from both non-sparse & sparse channels would be mixed together in unit of cache-line, which would hurt performance, both due to unfriendly access pattern to HW-prefetcher & DDR's physical page granularity.

But choose plain [IC,OC] layout poses challenge to 1st token latency because blocked layout is best for 1st-token/compute-bound case, so in this PR, we have to also minimize the degradation of 1st token latency.

Tickets:

@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra labels Dec 9, 2024
@github-actions github-actions bot removed the category: build OpenVINO cmake script / infra label Dec 18, 2024
@usstq usstq marked this pull request as ready for review December 19, 2024 13:38
@usstq usstq requested review from a team as code owners December 19, 2024 13:38
@usstq usstq requested a review from luo-cheng2021 December 27, 2024 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant