From baa3b22137d9d47097bd5a17736c0639ecf38e5b Mon Sep 17 00:00:00 2001 From: Fanli Lin Date: Wed, 4 Dec 2024 23:48:34 +0800 Subject: [PATCH] [docs] add a comment that offloading requires CUDA GPU (#35055) * add commen to offloading * Update docs/source/en/kv_cache.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/kv_cache.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/en/kv_cache.md b/docs/source/en/kv_cache.md index 05ab9eafa72349..b1d1e0998f06ed 100644 --- a/docs/source/en/kv_cache.md +++ b/docs/source/en/kv_cache.md @@ -180,7 +180,7 @@ Fun fact: The shortest war in history was between Britain and Zanzibar on August -Cache offloading requires a GPU and can be slower than dynamic KV cache. Use it if you are getting CUDA out of memory errors. +Cache offloading requires a CUDA GPU and can be slower than dynamic KV cache. Use it if you are getting CUDA out of memory errors. @@ -261,6 +261,7 @@ This will use the [`~OffloadedStaticCache`] implementation instead. >>> tokenizer.batch_decode(out, skip_special_tokens=True)[0] "Hello, my name is [Your Name], and I am a [Your Profession] with [Number of Years] of" ``` +Cache offloading requires a CUDA GPU. ### Sliding Window Cache