You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used PyTorch-Quantization for post-training INT8 quantization on the dinov2-base model and then converted it to a TensorRT model. However, I found that the INT8 model is slightly slower than the FP16 model (the same conclusion was observed on A100, V100, and A10). Is this behavior normal?
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I used PyTorch-Quantization for post-training INT8 quantization on the dinov2-base model and then converted it to a TensorRT model. However, I found that the INT8 model is slightly slower than the FP16 model (the same conclusion was observed on A100, V100, and A10). Is this behavior normal?
Thank you.
The text was updated successfully, but these errors were encountered: