Skip to content

Latest commit

 

History

History

EVA-01

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

✝️EVA: An Open Billion-Scale Vision Foundation Model

Yuxin Fang2,1, Wen Wang3,1, Binhui Xie4,1, Quan Sun1, Ledell Wu1, Xinggang Wang2, Tiejun Huang1, Xinlong Wang1, Yue Cao1

1BAAI, 2HUST, 3ZJU, 4BIT

CVPR 2023, 🌟highlight🌟


PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC

We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data and academic resources. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features (i.e., CLIP features) conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks.

EVA is the first open-sourced billion-scale vision foundation model that achieves state-of-the-art performance on a broad range of downstream tasks.

News

Get Started

All EVA model checkpoints are now available at 🤗 Hugging Face Models and BAAI ModelHub (EVA & EVA-CLIP). Try them out!

Summary of EVA's performance

image & video classification

image classificationvideo classification
model#param.IN-1K, e2e ftIN-1K, linearIN-1K, zero-shot12 avg. zero-shotK400K600K700
EVA or EVA-CLIP1.0B89.786.578.575.789.789.882.9

object detection & segmentation

COCO det & ins segLVIS det & ins segsem seg
model#param.det (test)det (val)seg (test)seg (val)detsegCOCO-StuffADE20K
EVA1.0B64.764.555.555.062.255.053.462.3

BibTeX & Citation

@article{EVA,
  title={EVA: Exploring the Limits of Masked Visual Representation Learning at Scale},
  author={Fang, Yuxin and Wang, Wen and Xie, Binhui and Sun, Quan and Wu, Ledell and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2211.07636},
  year={2022}
}

Contact

  • For help and issues associated with EVA, or reporting a bug, please open a GitHub Issue with label EVA-01. Let's build a better & stronger EVA together :)

  • We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, self-supervised learning and multimodal learning, please contact Yue Cao ([email protected]) and Xinlong Wang ([email protected]).