You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GPU memory consumption of the model was too high, so I converted it to a LLAMA.CPP file. The GPU memory usage is fine.
However, due to the nature of the model converted to llama.cpp in the model inference step, we need to convert the input parameter format. If there are any llama.cpp experts, we would appreciate it if you could tell us how to convert it.
# all-seeing/all-seeing-v2/llava/eval/model_vqa_loader_vocab_rank.py line 156 :
# model ( == ASMv2.gguf )
# Below are the source code locations that need to be converted
with torch.inference_mode():
logits = model(
input_ids=input_ids,
attention_mask=attention_mask,
images=image_tensor.to(dtype=torch.float16, device=args.device, non_blocking=True),
).logits
The text was updated successfully, but these errors were encountered:
Using the example provided by llama.cpp, I'm done verifying that the model performs the behavior I want.
However, I'm not going to change the issue to solved in case you have more official answers.
The GPU memory consumption of the model was too high, so I converted it to a LLAMA.CPP file. The GPU memory usage is fine.
However, due to the nature of the model converted to llama.cpp in the model inference step, we need to convert the input parameter format. If there are any llama.cpp experts, we would appreciate it if you could tell us how to convert it.
The text was updated successfully, but these errors were encountered: