Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] CUDA error #53

Open
MSMsssss opened this issue Dec 18, 2024 · 2 comments
Open

[ERROR] CUDA error #53

MSMsssss opened this issue Dec 18, 2024 · 2 comments

Comments

@MSMsssss
Copy link

I would like to ask, I have two virtual machines, each with only one GPU Nividia A10, I modified the source code, started the ray cluster with two machines, and started the api server with the following command, there will be the following error, what is the problem? Thank you very much
python -m distserve.api_server.distserve_api_server
--host 0.0.0.0
--port 8000
--model openai-community/gpt2
--tokenizer openai-community/gpt2
--context-tensor-parallel-size 1
--context-pipeline-parallel-size 1
--decoding-tensor-parallel-size 1
--decoding-pipeline-parallel-size 1
--block-size 16
--max-num-blocks-per-req 128
--gpu-memory-utilization 0.95
--swap-space 16
--context-sched-policy fcfs
--context-max-batch-size 128
--context-max-tokens-per-batch 8192
--decoding-sched-policy fcfs
--decoding-max-batch-size 1024
--decoding-max-tokens-per-batch 65536

image image
@interestingLSY
Copy link
Member

Currently DistServe relies on cudaIpcMemHandle for KV cache transferring. For one particular layer, DistServe requires the corresponding prefill & decoding instances to be reside on two GPUs on the same node, and those GPUs must be connected via NVLink (not really sure about this).

@MSMsssss
Copy link
Author

目前 DistServe 依赖于cudaIpcMemHandleKV 缓存传输。对于某一层,DistServe 要求相应的预填充和解码实例驻留在同一个节点上的两个 GPU 上,并且这些 GPU 必须通过 NVLink 连接(对此不太确定)。

Get, Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants