Parameter-Inverted Image Pyramid Networks (PIIP)

The official implementation of the paper "Parameter-Inverted Image Pyramid Networks"

NeurIPS 2024 Spotlight (Top 2.08%)

⭐️ Highlights

TL;DR: We introduce the Parameter-Inverted Image Pyramid Networks (PIIP), employing a parameter-inverted paradigm that uses models with different parameter sizes to process different resolution levels of the image pyramid, thereby saving computation cost while improving the performance.

Support tasks of object detection, instance segmentation, semantic segmentation and image classification.
Surpasses single-branch methods with higher performance and lower computation cost.
Improve the performance of InternViT-6B on object detection by 2.0% (55.8% $\rm AP^b$) while reducing computation cost by 62%.

📌 Abstract

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks.

🔍 Method

🛠️ Usage

For instructions on installation, pretrained models, training and evaluation, please refer to the readme files under each subfolder:

mmdetection
mmsegmentation
classification

🚀 Released Models

COCO Object Detection and Instance Segmentation

Note:

We report the number of parameters and FLOPs of the backbone.
Results in the paper were obtained with an internal codebase, which may exhibit slightly different performance than this repo ($\leq\pm0.2$).
Experiments involving InternViT-6B do not use window attention, different from those in the paper.

Backbone	Detector	Resolution	Schd	Box mAP	Mask mAP	#Param	#FLOPs	Download
ViT-B	Mask R-CNN	1024	1x	43.7	39.7	90M	463G	log \| ckpt \| cfg
PIIP-TSB	Mask R-CNN	1120/896/448	1x	43.6	38.7	146M	243G	log \| ckpt \| cfg
PIIP-TSB	Mask R-CNN	1568/896/448	1x	45.0	40.3	147M	287G	log \| ckpt \| cfg
PIIP-TSB	Mask R-CNN	1568/1120/672	1x	46.5	41.3	149M	453G	log \| ckpt \| cfg

ViT-L	Mask R-CNN	1024	1x	46.7	42.5	308M	1542G	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	1120/672/448	1x	46.5	40.8	493M	727G	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	1344/896/448	1x	48.3	42.7	495M	1002G	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	1568/896/672	1x	49.3	43.7	497M	1464G	log \| ckpt \| cfg
PIIP-TSBL	Mask R-CNN	1344/896/672/448	1x	47.1	41.9	506M	755G	log \| ckpt \| cfg
PIIP-TSBL	Mask R-CNN	1568/1120/672/448	1x	48.2	42.9	507M	861G	log \| ckpt \| cfg
PIIP-TSBL	Mask R-CNN	1792/1568/1120/448	1x	49.4	44.1	512M	1535G	log \| ckpt \| cfg

InternViT-6B	Mask R-CNN	1024	1x	53.8	48.1	5919M	29323G	log \| ckpt \| cfg
PIIP-H6B	Mask R-CNN	1024/512	1x	55.8	49.0	6872M	11080G	log \| ckpt \| cfg

Backbone	Detector	Pretrain	Resolution	Schd	Box mAP	Mask mAP	Download
PIIP-SBL	Mask R-CNN	AugReg (384)	1568/1120/672	1x	48.3	42.6	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	DeiT III (S) + Uni-Perceiver (BL)	1568/1120/672	1x	48.8	42.9	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	DeiT III (S) + MAE (BL)	1568/1120/672	1x	49.1	43.0	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	DeiT III	1568/1120/672	1x	50.0	44.4	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	DeiT III (S) + DINOv2 (BL)	1568/1120/672	1x	51.0	44.7	log \| ckpt \| cfg
PIIP-SBL	Mask R-CNN	DeiT III (S) + BEiTv2 (BL)	1568/1120/672	1x	51.8	45.4	log \| ckpt \| cfg
PIIP-SBL	DINO	DeiT III (384)	1792/1120/672	3x	57.8	-	log \| ckpt \| cfg
PIIP-H6B	DINO	MAE (H) + InternVL (6B)	1024/768	1x	60.0	-	log \| ckpt \| cfg

ADE20K Semantic Segmentation

Backbone	Detector	Resolution	Schd	mIoU	#Param	#FLOPs	Download
InternViT-6B	UperNet	512	80k	58.42	5910M	6364G	log \| ckpt \| cfg
PIIP-H6B	UperNet	512/192	80k	57.81	6745M	1663G	log \| ckpt \| cfg
PIIP-H6B	UperNet	512/256	80k	58.35	6745M	2354G	log \| ckpt \| cfg
PIIP-H6B	UperNet	512/384	80k	59.32	6746M	4374G	log \| ckpt \| cfg
PIIP-H6B	UperNet	512/512	80k	59.85	6747M	7308G	log \| ckpt \| cfg

ImageNet-1K Image Classification

Model	Resolution	#Param	#FLOPs	Top-1 Acc	Config	Download
PIIP-TSB	368/192/128	144M	17.4G	82.1	config	log \| ckpt
PIIP-SBL	320/160/96	489M	39.0G	85.2	config	log \| ckpt
PIIP-SBL	384/192/128	489M	61.2G	85.9	config	log \| ckpt

📅 Schedule

detection code
classification code
segmentation code

🖊️ Citation

If you find this work helpful for your research, please consider giving this repo a star ⭐ and citing our paper:

@article{piip,
  title={Parameter-Inverted Image Pyramid Networks},
  author={Zhu, Xizhou and Yang, Xue and Wang, Zhaokai and Li, Hao and Dou, Wenhan and Ge, Junqi and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
  journal={arXiv preprint arXiv:2406.04330},
  year={2024}
}

📃 License

This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.

🙏 Acknowledgements

Our code is built with reference to the code of the following projects: InternVL-MMDetSeg, ViT-Adapter, DeiT, MMDetection, MMSegmentation, and timm. Thanks for their awesome work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parameter-Inverted Image Pyramid Networks (PIIP)

⭐️ Highlights

📌 Abstract

🔍 Method

🛠️ Usage

🚀 Released Models

COCO Object Detection and Instance Segmentation

ADE20K Semantic Segmentation

ImageNet-1K Image Classification

📅 Schedule

🖊️ Citation

📃 License

🙏 Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parameter-Inverted Image Pyramid Networks (PIIP)

⭐️ Highlights

📌 Abstract

🔍 Method

🛠️ Usage

🚀 Released Models

COCO Object Detection and Instance Segmentation

ADE20K Semantic Segmentation

ImageNet-1K Image Classification

📅 Schedule

🖊️ Citation

📃 License

🙏 Acknowledgements