How to use inference mode of ASMv2 (llama.cpp)? #24

sailfish009 · 2024-08-28T10:46:11Z

The GPU memory consumption of the model was too high, so I converted it to a LLAMA.CPP file. The GPU memory usage is fine.
However, due to the nature of the model converted to llama.cpp in the model inference step, we need to convert the input parameter format. If there are any llama.cpp experts, we would appreciate it if you could tell us how to convert it.

        # all-seeing/all-seeing-v2/llava/eval/model_vqa_loader_vocab_rank.py line 156 :
        # model ( == ASMv2.gguf )
        # Below are the source code locations that need to be converted
        with torch.inference_mode():
            logits = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                images=image_tensor.to(dtype=torch.float16, device=args.device, non_blocking=True),
            ).logits

The text was updated successfully, but these errors were encountered:

sailfish009 · 2024-08-30T21:08:52Z

Using the example provided by llama.cpp, I'm done verifying that the model performs the behavior I want.
However, I'm not going to change the issue to solved in case you have more official answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use inference mode of ASMv2 (llama.cpp)? #24

How to use inference mode of ASMv2 (llama.cpp)? #24

sailfish009 commented Aug 28, 2024 •

edited

Loading

sailfish009 commented Aug 30, 2024

How to use inference mode of ASMv2 (llama.cpp)? #24

How to use inference mode of ASMv2 (llama.cpp)? #24

Comments

sailfish009 commented Aug 28, 2024 • edited Loading

sailfish009 commented Aug 30, 2024

sailfish009 commented Aug 28, 2024 •

edited

Loading