Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

AndreaChiChengdu · 2024-12-25T10:40:17Z

🐛 Describe the bug

https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md
I followed this demo to start quantifying the model. The beginning was smooth, but the flatbuffer assertion would occur at the end. My model was llama2-7b and quantized by a16w4, which could not generate the final pte for qnn. I've also tried --num_sharding 4, but other problems pop up

PTQ# 4 bits weight only quantize
python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte”
It starts out fine, but ends up with the following problem
————————————————————————————————————————————————————

INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_convolution_default_224, aten.convolution.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_permute_copy_default_3493, aten.permute_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_view_copy_default_577, aten.view_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: quantized_decomposed_dequantize_per_tensor_tensor, quantized_decomposed.dequantize_per_tensor.tensor
python: /002data/andrea/executorch/qnn/executorch/third-party/flatbuffers/include/flatbuffers/vector_downward.h:146: size_t flatbuffers::vector_downward::ensure_space(size_t) [with SizeT = unsigned int; size_t = long unsigned int]: Assertion `size() < max_size_' failed.
Aborted (core dumped)

Versions

source code build v0.5 follow the android llama demo for qualcomm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

AndreaChiChengdu commented Dec 25, 2024

Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

Comments

AndreaChiChengdu commented Dec 25, 2024

🐛 Describe the bug

Versions