Efficient Training of Visual Transformers with Small Datasets

To appear in NerIPS 2021.

[paper][Poster & Video][arXiv][code] [reviews]
Yahui Liu^1,3, Enver Sangineto¹, Wei Bi², Nicu Sebe¹, Bruno Lepri³, Marco De Nadai³
¹University of Trento, Italy, ²Tencent AI Lab, China, ³Bruno Kessler Foundation, Italy.

Data preparation

Dataset	Download Link
ImageNet	train,val
CIFAR-10	all
CIFAR-100	all
SVHN	train,test, extra
Oxford-Flower102	images, labels, splits
Clipart	images, train_list, test_list
Infograph	images, train_list, test_list
Painting	images, train_list, test_list
Quickdraw	images, train_list, test_list
Real	images, train_list, test_list
Sketch	images, train_list, test_list

Download the datasets and pre-processe some of them (i.e., imagenet, domainnet) by using codes in the scripts folder.
The datasets are prepared with the following stucture (except CIFAR-10/100 and SVHN):

dataset_name
  |__train
  |    |__category1
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__category2
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__...
  |__val
       |__category1
       |    |__xxx.jpg
       |    |__...
       |__category2
       |    |__xxx.jpg
       |    |__...
       |__...

Training

After prepare the datasets, we can simply start the training with 8 NVIDIA V100 GPUs:

sh train.sh

Evaluation

We can also load the pre-trained model and test the performance:

sh eval.sh

Pretrained models

For fast evaluation, we present the results of Swin-T trained with 100 epochs on various datasets as an example (Note that we save the model every 5 epochs during the training, so the attached best models may be slight different from the reported performances).

Datasets	Baseline	Ours
CIFAR-10	59.47	83.89
CIFAR-100	53.28	66.23
SVHN	71.60	94.23
Flowers102	34.51	39.37
Clipart	38.05	47.47
Infograph	8.20	10.16
Painting	35.92	41.86
Quickdraw	24.08	69.41
Real	73.47	75.59
Sketch	11.97	38.55

We provide a demo to download the pretrained models from Google Drive directly:

python3 ./scripts/collect_models.py

Related Work:

Acknowledgments

This code is highly based on the Swin-Transformer. Thanks to the contributors of this project.

Citation

@InProceedings{liu2021efficient,
    author    = {Liu, Yahui and Sangineto, Enver and Bi, Wei and Sebe, Nicu and Lepri, Bruno and De Nadai, Marco},
    title     = {Efficient Training of Visual Transformers with Small Datasets},
    booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
    year      = {2021}
}

If you have any questions, please contact me without hesitation (yahui.cvrs AT gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
data		data
drloc		drloc
figures		figures
models		models
scripts		scripts
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
config.py		config.py
eval.sh		eval.sh
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
optimizer.py		optimizer.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Training of Visual Transformers with Small Datasets

Data preparation

Training

Evaluation

Pretrained models

Related Work:

Acknowledgments

Citation

About

Releases

Packages

Languages

License

yhlleo/VTs-Drloc

Folders and files

Latest commit

History

Repository files navigation

Efficient Training of Visual Transformers with Small Datasets

Data preparation

Training

Evaluation

Pretrained models

Related Work:

Acknowledgments

Citation

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages