Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Running benchmark.py #1423

Open
tim102187S opened this issue Dec 23, 2024 · 3 comments
Open

Issue with Running benchmark.py #1423

tim102187S opened this issue Dec 23, 2024 · 3 comments

Comments

@tim102187S
Copy link

I am encountering an issue while using the following setup:

Tool: openvino.genai/tools /llm_bench/benchmark.py
Device: Intel Meteor Lake
Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / GPU Driver / NPU Driver
Model used:

CPU / iGPU: llama-3.1-8b-instruct (INT4)
NPU: llama-3-8b-instruct (INT4-NPU)

When running the command, I received the following error message:

python3 benchmark.py -m <path>/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)?" -ic 50

Message:

Run on CPU :

(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama-3.1-8b-instruct
[ INFO ] OV Config={'CACHE_DIR': ''}
[ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting.
[ INFO ] The num_beams is 1, update Torch thread num from 16 to 8, avoid to use the CPU cores for OpenVINO inference.
[ INFO ] Model path=/opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 1.47s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?

Run on NPU:

(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ INFO ] Pipeline initialization time: 21.64s
[ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0]
[ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark
text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list,
File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai
inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?

Could you please help me resolve this issue?

@eaidova
Copy link
Collaborator

eaidova commented Dec 23, 2024

@tim102187S you need to update your openvino-genai package and install it from nightly:

pip install -U --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly openvino_tokenizers openvino openvino-genai

after that, these metircs become available or you can switch to 2024/6 branch if you would like to use llm_bench compatible with latest stable release

@Kpeacef
Copy link

Kpeacef commented Dec 24, 2024

Having the same issue and suggestion from @eaidova resolved the issue. Thank you.

@tim102187S
Copy link
Author

I have recently updated my OpenVINO-GenAI package and switched to version 2024/6. While the issue with the CPU has been resolved, I am now encountering a problem when using the NPU.

  • Tool: openvino.genai/tools /llm_bench/benchmark.py
  • Device: Intel Meteor Lake
  • Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / NPU Driver 1.10.0
  • Model used:
    • NPU: llama-3-8b-instruct (INT4-NPU)

When running the following command:

python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)?" -ic 50

Message:

Run on NPU (After approximately 5 minutes of execution) :

(openvino_one) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)" -ic 50
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': ''}
[ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2025.0.0-17709-688f0428cfc
Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants