Cityscapes: https://www.cityscapes-dataset.com/
Our code expects the Cityscapes dataset directory to follow the following structure:
cityscapes
├── gtFine
| ├── train
| ├── val
├── leftImg8bit
| ├── train
| ├── val
ADE20K: https://groups.csail.mit.edu/vision/datasets/ADE20K/
Our code expects the ADE20K dataset directory to follow the following structure:
ade20k
├── annotations
| ├── training
| ├── validation
├── images
| ├── training
| ├── validation
Latency/Throughput is measured on NVIDIA Jetson Nano, NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included.
Model | Resolution | Cityscapes mIoU | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-L1 | 1024x2048 | 82.716 | 40M | 282G | 45.9ms | 122 image/s | link |
EfficientViT-L2 | 1024x2048 | 83.228 | 53M | 396G | 60.0ms | 102 image/s | link |
EfficientViT B series
Model | Resolution | Cityscapes mIoU | Params | MACs | Jetson Nano (bs1) | Jetson Orin (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-B0 | 1024x2048 | 75.653 | 0.7M | 4.4G | 275ms | 9.9ms | link |
EfficientViT-B1 | 1024x2048 | 80.547 | 4.8M | 25G | 819ms | 24.3ms | link |
EfficientViT-B2 | 1024x2048 | 82.073 | 15M | 74G | 1676ms | 46.5ms | link |
EfficientViT-B3 | 1024x2048 | 83.016 | 40M | 179G | 3192ms | 81.8ms | link |
Model | Resolution | ADE20K mIoU | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs16) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-L1 | 512x512 | 49.191 | 40M | 36G | 7.2ms | 947 image/s | link |
EfficientViT-L2 | 512x512 | 50.702 | 51M | 45G | 9.0ms | 758 image/s | link |
EfficientViT B series
Model | Resolution | ADE20K mIoU | Params | MACs | Jetson Nano (bs1) | Jetson Orin (bs1) | Checkpoint |
---|---|---|---|---|---|---|---|
EfficientViT-B1 | 512x512 | 42.840 | 4.8M | 3.1G | 110ms | 4.0ms | link |
EfficientViT-B2 | 512x512 | 45.941 | 15M | 9.1G | 212ms | 7.3ms | link |
EfficientViT-B3 | 512x512 | 49.013 | 39M | 22G | 411ms | 12.5ms | link |
# semantic segmentation
from efficientvit.seg_model_zoo import create_seg_model
model = create_seg_model(
name="l2", dataset="cityscapes", weight_url="assets/checkpoints/seg/cityscapes/l2.pt"
)
model = create_seg_model(
name="l2", dataset="ade20k", weight_url="assets/checkpoints/seg/ade20k/l2.pt"
)
Please run eval_seg_model.py
to evaluate our models.
Examples: segmentation
Please run eval_seg_model.py
to visualize the outputs of our semantic segmentation models.
Example:
python eval_seg_model.py --dataset cityscapes --crop_size 1024 --model b3 --save_path demo/cityscapes/b3/
You can also use demo_seg_model.py
to visualize the models.
Example:
python demo_seg_model.py --image_path assets/fig/indoor.jpg --dataset ade20k --crop_size 512 --model l2
python demo_seg_model.py --image_path assets/fig/city.png --dataset cityscapes --crop_size 1024 --model l2
To generate ONNX files, please refer to onnx_export.py
.
To generate TFLite files, please refer to tflite_export.py
. It requires the TinyNN package.
pip install git+https://github.com/alibaba/TinyNeuralNetwork.git
Example:
python tflite_export.py --export_path model.tflite --task seg --dataset ade20k --model b3 --resolution 512 512
If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@article{cai2022efficientvit,
title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
author={Cai, Han and Gan, Chuang and Han, Song},
journal={arXiv preprint arXiv:2205.14756},
year={2022}
}