- Python 3
- tensorflow 1.15
- numpy
- scikit-learn
python convert2onnx_v2.py
python modify_onnx_gs.py
git clone https://github.com/NVIDIA/trt-samples-for-hackathon-cn.git
cd build
and copy Onehot_plugin.so to Convbert folder.
Then generate the .trt file:
trtexec --onnx=ConvBert_onehot.onnx --plugins=OnehotPlugin.so --saveEngine=ConvBert_onehot.trt --verbose
python test_tf_trt_infer.py
tf_time= [INFO] TF execution time 367.3338 ms
trt_time= TRT execution time 9.16735 ms
The value is the average of inference time. The tf_time is over-estimated as it may contain the cpu time. It may need to do profiling to get an accurate value.
The speed-up ratio is 40.06.
Get the Docker Engine with cuda and tensorflow environment:
sudo docker pull registry.cn-hangzhou.aliyuncs.com/hackathon-fighters/21.03-tf1-py3-trt:v1
Here are some great resources we benefit:
Codebase: Our model codebase are based on Convbert.
ConvBert: NeurIPS 2020 paper ConvBERT: Improving BERT with Span-based Dynamic Convolution.
Dynamic convolution: Implementation from Pay Less Attention with Lightweight and Dynamic Convolutions.