You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the "paging" concept in the code.
While i was analyzing codes in quest_attention.py,
i couldn't find any "page" concept, while i could found picking top-k "chunked kv_seq_len" in def local_heavy_hitter_mask.
Is chunking concpet in the code is equal to paging in the paper? if not, where can i find "page" concept in the code?
in which part of the code uses QuestAttention class?
i see that LlamaAttention module's forward is substitued to "quest_attention.py's def forward"
However, i don't see any part that load QuestAttention class and use in quest_attention.py.
Is QuestionAttention class not used in the entire code?
Thanks!
The text was updated successfully, but these errors were encountered:
Yes, the “chunk” concept in the code is equivalent to the “page” concept in the paper. The chunking mechanism in local_heavy_hitter_mask is the same idea as paging.
The QuestAttention class is implemented to work with the kernel to enable real acceleration. However, for simplicity during accuracy evaluations, we simulate sparse attention using the implementation in evaluation/quest_attention.py instead of invoking the CUDA kernel directly.
I hope this answers your questions! Please let us know if you have any further questions. :)
Hi, @Sakits, @SiriusNEO, @happierpig.
Thanks for sharing wonderful works.
While i was analyzing codes in quest_attention.py,
i couldn't find any "page" concept, while i could found picking top-k "chunked kv_seq_len" in
def local_heavy_hitter_mask
.Is chunking concpet in the code is equal to paging in the paper? if not, where can i find "page" concept in the code?
i see that LlamaAttention module's forward is substitued to "quest_attention.py's def forward"
However, i don't see any part that load QuestAttention class and use in quest_attention.py.
Is QuestionAttention class not used in the entire code?
Thanks!
The text was updated successfully, but these errors were encountered: