PyTorch Baseline perhaps too weak? #31

xinli-git · 2023-01-11T20:02:03Z

I am wondering if the PyTorch baseline is actually optimized enough? Specifically, could you

Remove autocast since the model is already in FP16? AutoCast would actually convert some other non-GEMM fp16 kernels in FP32 (or TF32 in the case of Ampere GPUs)
Run some warm up iteration before measuring the inference latency (averaged across a few)? Like how you did it with TensorRT
Use flags such as torch.backends.cudnn.benchmark = True before running GPU kernels.

On my local machine, just these optimizations (for lack of a better word as they are not really optimizations) would make PyTorch baseline at least 2X faster.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch Baseline perhaps too weak? #31

PyTorch Baseline perhaps too weak? #31

xinli-git commented Jan 11, 2023

PyTorch Baseline perhaps too weak? #31

PyTorch Baseline perhaps too weak? #31

Comments

xinli-git commented Jan 11, 2023