Mask2Former Model Zoo and Baselines

Detectron2 ImageNet Pretrained Models

It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:

R-50.pkl (torchvision): converted copy of torchvision's ResNet-50 model. More details can be found in the conversion script.
R-103.pkl: a ResNet-101 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers (a.k.a. ResNet101c in our paper). We pre-train this backbone on ImageNet using the default recipe of pytorch examples.

Note: below are available pretrained models in Detectron2 that we do not use in our paper.

R-50.pkl: converted copy of MSRA's original ResNet-50 model.
R-101.pkl: converted copy of MSRA's original ResNet-101 model.
X-101-32x8d.pkl: ResNeXt-101-32x8d model trained with Caffe2 at FB.

Third-party ImageNet Pretrained Models

Our paper also uses ImageNet pretrained models that are not part of Detectron2, please refer to tools to get those pretrained models.