From ea9d5e0cd85d75008f06c94e95fe26f94b0ce6d2 Mon Sep 17 00:00:00 2001 From: copasseron <135103372+copasseron@users.noreply.github.com> Date: Thu, 4 Jul 2024 16:00:08 +0200 Subject: [PATCH] Update README.md for vLLM 0.4.2 args Triton's vLLM backend is based on vLLM 0.4.2 that propose more argument to the one in the documentation of the tutorial. --- Quick_Deploy/vLLM/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Quick_Deploy/vLLM/README.md b/Quick_Deploy/vLLM/README.md index 244c1a92..b54d59e6 100644 --- a/Quick_Deploy/vLLM/README.md +++ b/Quick_Deploy/vLLM/README.md @@ -79,9 +79,9 @@ The content of `model.json` is: ``` This file can be modified to provide further settings to the vLLM engine. See vLLM -[AsyncEngineArgs](https://github.com/vllm-project/vllm/blob/32b6816e556f69f1672085a6267e8516bcb8e622/vllm/engine/arg_utils.py#L165) +[AsyncEngineArgs](https://github.com/vllm-project/vllm/blob/c7f2cf2b7f67bce5842fedfdba508440fe257375/vllm/engine/arg_utils.py#L615) and -[EngineArgs](https://github.com/vllm-project/vllm/blob/32b6816e556f69f1672085a6267e8516bcb8e622/vllm/engine/arg_utils.py#L11) +[EngineArgs](https://github.com/vllm-project/vllm/blob/c7f2cf2b7f67bce5842fedfdba508440fe257375/vllm/engine/arg_utils.py#L21) for supported key-value pairs. Inflight batching and paged attention is handled by the vLLM engine.