Skip to content

Commit

Permalink
[docs] add a comment that offloading requires CUDA GPU (huggingface#3…
Browse files Browse the repository at this point in the history
…5055)

* add commen to offloading

* Update docs/source/en/kv_cache.md

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>
  • Loading branch information
faaany and stevhliu authored Dec 4, 2024
1 parent 1da1e0d commit baa3b22
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/source/en/kv_cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Fun fact: The shortest war in history was between Britain and Zanzibar on August

<Tip warning={true}>

Cache offloading requires a GPU and can be slower than dynamic KV cache. Use it if you are getting CUDA out of memory errors.
Cache offloading requires a CUDA GPU and can be slower than dynamic KV cache. Use it if you are getting CUDA out of memory errors.

</Tip>

Expand Down Expand Up @@ -261,6 +261,7 @@ This will use the [`~OffloadedStaticCache`] implementation instead.
>>> tokenizer.batch_decode(out, skip_special_tokens=True)[0]
"Hello, my name is [Your Name], and I am a [Your Profession] with [Number of Years] of"
```
Cache offloading requires a CUDA GPU.


### Sliding Window Cache
Expand Down

0 comments on commit baa3b22

Please sign in to comment.