Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow the guide of qnn Android demo to run until coredump occurs during qnn model quantization #7437

Open
AndreaChiChengdu opened this issue Dec 25, 2024 · 0 comments

Comments

@AndreaChiChengdu
Copy link

🐛 Describe the bug

https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md
I followed this demo to start quantifying the model. The beginning was smooth, but the flatbuffer assertion would occur at the end. My model was llama2-7b and quantized by a16w4, which could not generate the final pte for qnn. I've also tried --num_sharding 4, but other problems pop up

PTQ# 4 bits weight only quantize
python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte”
It starts out fine, but ends up with the following problem
————————————————————————————————————————————————————

INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_convolution_default_224, aten.convolution.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_permute_copy_default_3493, aten.permute_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: aten_view_copy_default_577, aten.view_copy.default
INFO:executorch.backends.qualcomm.qnn_preprocess:Visiting: quantized_decomposed_dequantize_per_tensor_tensor, quantized_decomposed.dequantize_per_tensor.tensor
python: /002data/andrea/executorch/qnn/executorch/third-party/flatbuffers/include/flatbuffers/vector_downward.h:146: size_t flatbuffers::vector_downward::ensure_space(size_t) [with SizeT = unsigned int; size_t = long unsigned int]: Assertion `size() < max_size_' failed.
Aborted (core dumped)

Versions

source code build v0.5 follow the android llama demo for qualcomm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant