You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PTQ# 4 bits weight only quantize
python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte”
It starts out fine, but ends up with the following problem
————————————————————————————————————————————————————
🐛 Describe the bug
https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md
I followed this demo to start quantifying the model. The beginning was smooth, but the flatbuffer assertion would occur at the end. My model was llama2-7b and quantized by a16w4, which could not generate the final pte for qnn. I've also tried --num_sharding 4, but other problems pop up
PTQ# 4 bits weight only quantize
python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte”
It starts out fine, but ends up with the following problem
————————————————————————————————————————————————————
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_convolution_default_224, aten.convolution.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_permute_copy_default_3493, aten.permute_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_view_copy_default_577, aten.view_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: quantized_decomposed_dequantize_per_tensor_tensor, quantized_decomposed.dequantize_per_tensor.tensor
python: /002data/andrea/executorch/qnn/executorch/third-party/flatbuffers/include/flatbuffers/vector_downward.h:146: size_t flatbuffers::vector_downward::ensure_space(size_t) [with SizeT = unsigned int; size_t = long unsigned int]: Assertion `size() < max_size_' failed.
Aborted (core dumped)
Versions
source code build v0.5 follow the android llama demo for qualcomm
The text was updated successfully, but these errors were encountered: