Issue with Running benchmark.py #1423

tim102187S · 2024-12-23T06:49:20Z

I am encountering an issue while using the following setup:

Tool: openvino.genai/tools /llm_bench/benchmark.py
Device: Intel Meteor Lake
Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / GPU Driver / NPU Driver
Model used:

CPU / iGPU: llama-3.1-8b-instruct (INT4)
NPU: llama-3-8b-instruct (INT4-NPU)

When running the command, I received the following error message:

python3 benchmark.py -m <path>/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)?" -ic 50

Message:

Run on CPU :

(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama-3.1-8b-instruct
[ INFO ] OV Config={'CACHE_DIR': ''}
[ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting.
[ INFO ] The num_beams is 1, update Torch thread num from 16 to 8, avoid to use the CPU cores for OpenVINO inference.
[ INFO ] Model path=/opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 1.47s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?

Run on NPU:

(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 21.64s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?

Could you please help me resolve this issue?

eaidova · 2024-12-23T11:27:50Z

@tim102187S you need to update your openvino-genai package and install it from nightly:

pip install -U --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly openvino_tokenizers openvino openvino-genai

after that, these metircs become available or you can switch to 2024/6 branch if you would like to use llm_bench compatible with latest stable release

Kpeacef · 2024-12-24T06:21:26Z

Having the same issue and suggestion from @eaidova resolved the issue. Thank you.

tim102187S · 2024-12-25T08:27:51Z

I have recently updated my OpenVINO-GenAI package and switched to version 2024/6. While the issue with the CPU has been resolved, I am now encountering a problem when using the NPU.

Tool: openvino.genai/tools /llm_bench/benchmark.py
Device: Intel Meteor Lake
Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / NPU Driver 1.10.0
Model used:
- NPU: llama-3-8b-instruct (INT4-NPU)

When running the following command:

python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)?" -ic 50

Message:

Run on NPU (After approximately 5 minutes of execution) :

(openvino_one) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2025.0.0-17709-688f0428cfc
Segmentation fault (core dumped)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Running benchmark.py #1423

Issue with Running benchmark.py #1423

tim102187S commented Dec 23, 2024

eaidova commented Dec 23, 2024 •

edited

Loading

Kpeacef commented Dec 24, 2024

tim102187S commented Dec 25, 2024

Issue with Running benchmark.py #1423

Issue with Running benchmark.py #1423

Comments

tim102187S commented Dec 23, 2024

Message:

eaidova commented Dec 23, 2024 • edited Loading

Kpeacef commented Dec 24, 2024

tim102187S commented Dec 25, 2024

Message:

eaidova commented Dec 23, 2024 •

edited

Loading