LLMServe / DistServe Public

Notifications You must be signed in to change notification settings
Fork 50
Star 403

Code
Issues 23
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: LLMServe/DistServe

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

23 Open 22 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[ERROR] CUDA error

#53 opened Dec 18, 2024 by MSMsssss

How to independently measure the performance of the Prefill phase and the Decode phase？

#50 opened Nov 19, 2024 by J1nLo

offline/online serving is stuck at fetching files in paraworkers

#48 opened Oct 16, 2024 by hyuenmin-choi

Can distserve change block_size ?

#47 opened Oct 10, 2024 by WANGHanshuo1220

Error in building SwiftTransformers (error: more than one conversion function from "half" to a built-in type applies)

#46 opened Sep 19, 2024 by lylcyl

How to use tensorrt-llm as the inference backend for DistServe

#45 opened Sep 18, 2024 by GGBond8488

Error in building SwiftTransformers with torch 2.2.4

#44 opened Aug 28, 2024 by gursimar

Adding a new model?

#43 opened Aug 22, 2024 by gursimar

What does pp_cross mean in the simulator output?

#41 opened Aug 14, 2024 by xshqhua

模型推理结果混乱，怎么解决。

#40 opened Aug 13, 2024 by liweiqing1997

编译SwiftTransformer失败

#37 opened Aug 9, 2024 by FredHuang99

fail to run examples/offline.py , unable to download the model to reproduce

#35 opened Aug 6, 2024 by William12github

Cmake build fail

#31 opened Jul 31, 2024 by hyuenmin-choi

Generating max_num_tokens.csv for Different Hardware Environments

#30 opened Jul 28, 2024 by village-way

Why appear 0 unaccepted, 0 waiting, 0 processing？

#26 opened Jul 18, 2024 by LeSoleilGo

Model not loaded error

#24 opened Jul 15, 2024 by melissadu-db

Great work!

#20 opened Jul 4, 2024 by irasin

codellama34b ttft延迟问题 question

Further information is requested

#19 opened Jul 2, 2024 by sitabulaixizawaluduo

Decode Wrong Token help wanted

Extra attention is needed

#16 opened Jun 18, 2024 by sitabulaixizawaluduo

Offline.py LLMEngine.__init__() missing 1 required positional argument: 'simulator_config' help wanted

Extra attention is needed

#15 opened Jun 14, 2024 by fivebamboo694

How to profile help wanted

Extra attention is needed

#13 opened Jun 14, 2024 by YLSnowy

How difficult will adding Llama 3 support be? enhancement

New feature or request

#12 opened Jun 13, 2024 by kalradivyanshu

decoder.embed_tokens.weight.pt not found help wanted

Extra attention is needed

#10 opened Jun 11, 2024 by llx-08

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly