Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In which part of the code that i can find "page" concept? #18

Open
YEOMJINSEOP opened this issue Dec 27, 2024 · 1 comment
Open

In which part of the code that i can find "page" concept? #18

YEOMJINSEOP opened this issue Dec 27, 2024 · 1 comment

Comments

@YEOMJINSEOP
Copy link

YEOMJINSEOP commented Dec 27, 2024

Hi, @Sakits, @SiriusNEO, @happierpig.
Thanks for sharing wonderful works.

  1. I have a question regarding the "paging" concept in the code.

While i was analyzing codes in quest_attention.py,
i couldn't find any "page" concept, while i could found picking top-k "chunked kv_seq_len" in def local_heavy_hitter_mask.
Is chunking concpet in the code is equal to paging in the paper? if not, where can i find "page" concept in the code?

  1. in which part of the code uses QuestAttention class?
    i see that LlamaAttention module's forward is substitued to "quest_attention.py's def forward"
    However, i don't see any part that load QuestAttention class and use in quest_attention.py.
    Is QuestionAttention class not used in the entire code?

Thanks!

@Sakits
Copy link
Collaborator

Sakits commented Dec 27, 2024

Hi @YEOMJINSEOP ,

Thank you for your interest in our work!

  1. Yes, the “chunk” concept in the code is equivalent to the “page” concept in the paper. The chunking mechanism in local_heavy_hitter_mask is the same idea as paging.
  2. The QuestAttention class is implemented to work with the kernel to enable real acceleration. However, for simplicity during accuracy evaluations, we simulate sparse attention using the implementation in evaluation/quest_attention.py instead of invoking the CUDA kernel directly.

I hope this answers your questions! Please let us know if you have any further questions. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants