-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Running benchmark.py #1423
Comments
@tim102187S you need to update your openvino-genai package and install it from nightly:
after that, these metircs become available or you can switch to 2024/6 branch if you would like to use llm_bench compatible with latest stable release |
Having the same issue and suggestion from @eaidova resolved the issue. Thank you. |
I have recently updated my OpenVINO-GenAI package and switched to version 2024/6. While the issue with the CPU has been resolved, I am now encountering a problem when using the NPU.
When running the following command: python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)?" -ic 50 Message:Run on NPU (After approximately 5 minutes of execution) : (openvino_one) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)" -ic 50 |
I am encountering an issue while using the following setup:
Tool: openvino.genai/tools /llm_bench/benchmark.py
Device: Intel Meteor Lake
Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / GPU Driver / NPU Driver
Model used:
When running the command, I received the following error message:
Message:
Run on CPU :
(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama-3.1-8b-instruct
[ INFO ] OV Config={'CACHE_DIR': ''}
[ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting.
[ INFO ] The num_beams is 1, update Torch thread num from 16 to 8, avoid to use the CPU cores for OpenVINO inference.
[ INFO ] Model path=/opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 1.47s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?
Run on NPU:
(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 21.64s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?
Could you please help me resolve this issue?
The text was updated successfully, but these errors were encountered: