We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dear all, I am testing the performance/throughput of fp32 and quantized models on my platform. My configuration is as follows:
tflite-runtime==2.5.0.post1 tensorflow==1.14.0
*FP32 on CPU
-INFO- Running prediction... -INFO- Acquired 1 file(s) for model 'MobileNet v1.0' -INFO- Task runtime: 0:00:28.796083 -INFO- Throughput: 35.8 fps -INFO- Latency: 29.5 ms -INFO- Target Workload H/W Prec Batch Conc. Metric Score Units -INFO- ----------------------------------------------------------------------------------- -INFO- tensorflow_lite mobilenet cpu fp32 1 1 throughput 35.8 fps -INFO- tensorflow_lite mobilenet cpu fp32 1 1 latency 29.5 ms -INFO- Total runtime: 0:00:28.830364 -INFO- Done
INT8 on CPU
google@localhost:~/mlmark$ harness/mlmark.py -c config/tflite-cpu-mobilenet-int8-throughput.json -INFO- Running prediction... -INFO- Acquired 1 file(s) for model 'MobileNet v1.0' -INFO- Task runtime: 0:01:00.933346 -INFO- Throughput: 16.9 fps -INFO- Latency: 65. ms -INFO- Target Workload H/W Prec Batch Conc. Metric Score Units -INFO- ----------------------------------------------------------------------------------- -INFO- tensorflow_lite mobilenet cpu int8 1 1 throughput 16.9 fps -INFO- tensorflow_lite mobilenet cpu int8 1 1 latency 65. ms -INFO- Total runtime: 0:01:00.960828 -INFO- Done
Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite: https://www.tensorflow.org/lite/guide/hosted_models#quantized_models
I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results.
Could you let me know, where it's going wrong?
Thanks Kind Regards Arun
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Dear all,
I am testing the performance/throughput of fp32 and quantized models on my platform. My configuration is as follows:
*FP32 on CPU
INT8 on CPU
Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite:
https://www.tensorflow.org/lite/guide/hosted_models#quantized_models
I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results.
Could you let me know, where it's going wrong?
Thanks
Kind Regards
Arun
The text was updated successfully, but these errors were encountered: