Skip to content

Latest commit

 

History

History
240 lines (217 loc) · 9.36 KB

README.md

File metadata and controls

240 lines (217 loc) · 9.36 KB

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

🚀 Updates

  • The codes of CLIPN with hand-crafted prompts are released (./hand-crafted).
  • The codes of CLIPN with learnable prompts are released (./src).
  • Thanks to the valuable suggestions from the reviewers of CVPR 2023 and ICCV 2023, our paper has been significantly improved, allowing it to be published at ICCV 2023.
  • If you are interested in CLIP-based open vocabulary tasks, please feel free to visit our another work! "CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks" (github).

⭐ Highlights of CLIPN

  • CLIPN attains SoTA performance in zero-shot OOD detection, all the while inheriting the in-distribution (ID) classification prowess of CLIP.
  • CLIPN offers an approach for unsupervised prompt learning using image-text-paired web-dataset.

🔨 Installation

  • Main python libraries of our experimental environment are shown in requirements.txt. You can install CLIPN following:
git clone https://github.com/xmed-lab/CLIPN.git
cd CLIPN
conda create -n CLIPN
conda activate CLIPN
pip install -r ./requirements.txt

💻 Prepare Dataset

  • Pre-training Dataset, CC3M. To download CC3M dataset as webdataset, please follow img2dataset.

When you have downloaded CC3M, please re-write your data root into ./src/run.sh.

  • OOD detection datasets.
    • ID dataset, ImageNet-1K: The ImageNet-1k dataset (ILSVRC-2012) can be downloaded here.
    • OOD dataset, iNaturalist, SUN, Places, and Texture. Please follow instruction from these two repositories MOS and MCM to download the subsampled datasets where semantically overlapped classes with ImageNet-1k are removed.

When you have downloaded the above datasets, please re-write your data root into ./src/tuning_util.py.

🔑 Pre-Train and Evaluate CLIPN

  • Pre-train CLIPN on CC3M. This step is to empower "no" logic within CLIP via the web-dataset.
cd ./src
sh run.sh
  • Zero-Shot Evaluate CLIPN on ImageNet-1K.
    • Metrics and pipeline are defined in ./src/zero_shot_infer.py. Here you can find three baseline methods, and our two inference algorithms: CTW and ATD (see Line 91-96).
    • Dataset details are defined in ./src/tuning_util.py.
    • Inference models are defined in ./src/classification.py, including converting the text encoders into classifiers.
    • You can download the models provided in the table below or pre-trained by yourself. Then re-write the path of your models in the main function of ./src/zero_shot_infer.py. Finally, evaluate CLIPN by:
python3 zero_shot_infer.py

📘 Reproduced Results

To ensure the reproducibility of the results, we conducted three repeated experiments under each configuration. The following will exhibit the most recent reproduced results achieved before open-sourcing.

  • ImageNet-1K
Methods Repeat iNaturalist SUN Textures Places Avg Model/log
AUROC FPR95 AUROC FPR95 AUROC FPR95 AUROC FPR95 AUROC FPR95
ViT-B-16
CLIPN-CTW 1 93.12 26.31 88.46 37.67 79.17 57.14 86.14 43.33 _ _ here
2 93.48 21.06 89.79 30.31 83.31 46.44 88.21 33..85 _ _ here
3 91.79 25.84 89.76 31.30 76.76 59.25 87.66 36.58 _ _ here
Avg 92.80 24.41 89.34 33.09 79.75 54.28 87.34 37.92 87.31 37.42 _
CLIPN-ATD 1 95.65 21.73 93.22 29.51 90.35 42.89 91.25 36.98 _ _ here
2 96.67 16.71 94.77 23.41 92.46 34.73 93.39 29.24 _ _ here
3 96.29 18.90 94.55 24.15 89.61 45.12 93.23 30.11 _ _ here
Avg 96.20 19.11 94.18 25.69 90.81 40.91 92.62 32.11 93.45 29.46 _

The performance in this table is better than our paper , because that we add an average learnable "no" prompt (see Line 600-616 in ./src/open_clip/model.py).

📝 Other Tips

There are several important factors that could affect the performance:

  • Class prompt texts. In the inference period, we need to use prompt texts to get the weights of classifier (see ./src/prompt/prompt.txt). You can hand on the design of high-performance inference prompts for our CLIPN.
  • The number of learnable "no" tokens. Now I just define the number of learnable "no" tokens as 16. You can vary it to find an optimal value.
  • If you have any ideas to enhance CLIPN or attempt to transfer this idea to other topics, feel free to discuss with me and I am happy to share some ideas with you.

📚 Citation

If you find our paper helps you, please kindly consider citing our paper in your publications.

@inproceedings{wang2023clipn,
  title={CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No},
  author={Wang, Hualiang and Li, Yi and Yao, Huifeng and Li, Xiaomeng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1802--1812},
  year={2023}
}

🍻 Acknowledge

We sincerely appreciate these three highly valuable repositories open_clip, MOS and MCM.