-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF #3
Comments
Hi, thanks for your interesting for our work. Unfortunately, I found that llama.cpp don't support for GPTQ format quantization type now (see ggerganov/llama.cpp#4165 for details). Therefore, it is not an easy things to converte our 2-bit model into GGUF. |
T-MAC has supported GPTQ format through llama.cpp GGUF integrated with its own highly optimized kernels, and already tested with Llama-3-8b-instruct-w4-g128/Llama-3-8b-instruct-w2-g128 from EfficientQAT. You can try it. |
Thanks for your reminder, I will give a try. |
@kaleid-liner Does T-MAC support w2g64. I have uploaded a w2g64 Mistral-Large-Instruct to huggingface, which is hot on Reddit. I think it would be interesting if T-MAC also support for w2g64. |
Sure. T-MAC supports any group size by setting |
hi, how is the test going? Does it support mistral |
@ChenMnZ @brownplayer Sure. It supports Mistral. |
Ok, thank you for your reply. May I ask what command is used to wake up the first time the model is downloaded? I'm using a GPTQ format model
…---- Replied Message ----
| From | ***@***.***> |
| Date | 08/20/2024 19:18 |
| To | OpenGVLab/EfficientQAT ***@***.***> |
| Cc | brownplayer ***@***.***>,
Mention ***@***.***> |
| Subject | Re: [OpenGVLab/EfficientQAT] GGUF (Issue #3) |
@***@***.*** Sure. It supports Mistral.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Any chance 2bit models can be used with llama.cpp? Would be great to get LLama 3.1 (8B and 70B) converted to GGUF to try them out locally.
Thanks for the great research work!
The text was updated successfully, but these errors were encountered: