Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

超长上下文造成服务挂死 #2584

Open
1 of 3 tasks
luckfu opened this issue Nov 25, 2024 · 6 comments
Open
1 of 3 tasks

超长上下文造成服务挂死 #2584

luckfu opened this issue Nov 25, 2024 · 6 comments
Assignees
Labels
Milestone

Comments

@luckfu
Copy link

luckfu commented Nov 25, 2024

System Info / 系統信息

registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:v0.16.3

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

0.16.3

The command used to start Xinference / 用以启动 xinference 的命令

当我用vllm引擎时,输入的上下文过长,会提示下面的错误,然后模型挂死,删不掉,后续访问没有响应。

2024-11-25 01:00:13,800 transformers.tokenization_utils_base 308 WARNING  Token indices sequence length is longer than the specified maximum sequence length for this model (208019 > 128000). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (208019 > 128000). Running this sequence through the model will result in indexing errors
INFO 11-25 01:00:13 metrics.py:351] Avg prompt throughput: 42.3 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 34.1%, CPU KV cache usage: 0.0%.
ERROR 11-25 01:00:13 async_llm_engine.py:63] Engine background task failed
ERROR 11-25 01:00:13 async_llm_engine.py:63] Traceback (most recent call last):
ERROR 11-25 01:00:13 async_llm_engine.py:63]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 53, in _log_task_completion
ERROR 11-25 01:00:13 async_llm_engine.py:63]     return_value = task.result()
ERROR 11-25 01:00:13 async_llm_engine.py:63]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 939, in run_engine_loop
ERROR 11-25 01:00:13 async_llm_engine.py:63]     result = task.result()
......

Reproduction / 复现过程

xinference launch --model-name glm4-chat-1m
--model-type LLM
--model-uid glm4-chat
--model_path /models/glm-4-9b-chat
--model-engine 'vllm'
--model-format 'pytorch'
--quantization None
--n-gpu 2
--gpu-idx "0,1"
--max_num_seqs 256
--tensor_parallel_size 2
--gpu_memory_utilization 0.95

Expected behavior / 期待表现

我知道这不是xinference的bug,但想从底层掐死这个问题,最好服务层在超长上下文时直接返回“超长”或者顺序截断

@XprobeBot XprobeBot added the gpu label Nov 25, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Nov 25, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Nov 26, 2024

收到,我们看看如何处理这个问题。

Copy link

github-actions bot commented Dec 3, 2024

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 3, 2024
@qinxuye qinxuye removed the stale label Dec 4, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 11, 2024
@qinxuye qinxuye removed the stale label Dec 12, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 19, 2024
Copy link

This issue was closed because it has been inactive for 5 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2024
@RuiNov1st
Copy link

相同的问题,请问是否有解决

@qinxuye qinxuye self-assigned this Dec 26, 2024
@qinxuye qinxuye removed the stale label Dec 26, 2024
@qinxuye qinxuye reopened this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants