Skip to content

Commit

Permalink
update ViT-Adapter-L + HTC++
Browse files Browse the repository at this point in the history
  • Loading branch information
czczup committed Jun 9, 2022
1 parent 9441aef commit ea89f6b
Show file tree
Hide file tree
Showing 6 changed files with 561 additions and 78 deletions.
73 changes: 55 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,9 @@
The official implementation of the paper "[Vision Transformer Adapter for Dense Predictions](https://arxiv.org/abs/2205.08534)".

## News

(2022/06/04) Segmentation is released.\
(2022/06/02) Detection is released and segmentation will come soon.\
(2022/05/17) ViT-Adapter-L yields 60.1 box AP and 52.1 mask AP on COCO test-dev.\
(2022/06/09) ViT-Adapter-L yields 60.4 box AP and 52.5 mask AP on COCO test-dev.\
(2022/06/04) Code and models are released.\
(2022/05/17) ~~ViT-Adapter-L yields 60.1 box AP and 52.1 mask AP on COCO test-dev.~~ \
(2022/05/12) ViT-Adapter-L reaches 85.2 mIoU on Cityscapes test set without coarse data.\
(2022/05/05) ViT-Adapter-L achieves the SOTA on ADE20K val set with 60.5 mIoU!

Expand All @@ -29,35 +28,73 @@ This work investigates a simple yet powerful adapter for Vision Transformer (ViT

## SOTA Model Zoo

COCO test-dev

| Method | Framework | Pre-train | Lr schd | box AP | mask AP | #Param |
|:------------------:|:---------:|:---------:|:-------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
| ViT-Adapter-L | HTC++ | BEiT | 3x | [58.5](https://drive.google.com/file/d/11zpPSvmuAn7aP5brxzHE8naObnOfFxby/view?usp=sharing) | [50.8](https://drive.google.com/file/d/1wIbtzfHfPqkvZaSivzcsh4HWu1oSiun6/view?usp=sharing) | 401M |
| ViT-Adapter-L (MS) | HTC++ | BEiT | 3x | [60.1](https://drive.google.com/file/d/1i-qjgUK4CMwZcmu5pkndldwfVbdkw5sU/view?usp=sharing) | [52.1](https://drive.google.com/file/d/16mlEOPY7K-Xpx_CL650A-LWbVDm2vl4X/view?usp=sharing) | 401M |

ADE20K val
**COCO mini-val test-dev**


<table>
<tr align=center>
<td rowspan="2" align=center><b>Method</b></td>
<td rowspan="2" align=center><b>Framework</b></td>
<td rowspan="2" align=center><b>Pre-train</b></td>
<td rowspan="2" align=center><b>Schd</b></td>
<td colspan="2" align=center><b>mini-val</b></td>
<td colspan="2" align=center><b>test-dev</b></td>
<td rowspan="2" align=center><b>#Param</b></td>
</tr>
<tr>
<td>box AP</td>
<td>mask AP</td>
<td>box AP</td>
<td>mask AP</td>
</tr>
<tr align=center>
<td>ViT-Adapter-L</td>
<td>HTC++</td>
<td>BEiT</td>
<td>3x</td>
<td>58.4</td>
<td>50.8</td>
<td><a href="https://drive.google.com/file/d/1lXQxf5PJ0g0bQNkMMrhG63jal0NsmYjb/view?usp=sharing">58.9</a></td>
<td><a href="https://drive.google.com/file/d/1nyuONJcHHXki0Cn8dCgbPZ9D_MURh47t/view?usp=sharing">51.3</a></td>
<td>401M</td>
</tr>
<tr align=center>
<td>ViT-Adapter-L$^\dagger$</td>
<td>HTC++</td>
<td>BEiT</td>
<td>3x</td>
<td>60.2</td>
<td>52.2</td>
<td><a href="https://drive.google.com/file/d/15t2Oc3FiNeLr6RnKOJ-0IbI7b2LalxbX/view?usp=sharing">60.4</a></td>
<td><a href="https://drive.google.com/file/d/1TIPOJC6ieZS_ZRNCbo_AW4UqYAkQIjyN/view?usp=sharing">52.5</a></td>
<td>401M</td>
</tr>
</table>

$\dagger$ demotes multi-scale testing.

**ADE20K val**

| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
|:-------------:|:-----------:|:---------------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
| ViT-Adapter-L | UperNet | BEiT | 160k | 640 | [58.0](https://drive.google.com/file/d/1KsV4QPfoRi5cj2hjCzy8VfWih8xCTrE3/view?usp=sharing) | [58.4](https://drive.google.com/file/d/1haeTUvQhKCM7hunVdK60yxULbRH7YYBK/view?usp=sharing) | 451M |
| ViT-Adapter-L | Mask2Former | BEiT | 160k | 640 | [58.3](https://drive.google.com/file/d/1jj56lSbc2s4ZNc-Hi-w6o-OSS99oi-_g/view?usp=sharing) | [59.0](https://drive.google.com/file/d/1hgpZB5gsyd7LTS7Aay2CbHmlY10nafCw/view?usp=sharing) | 568M |
| ViT-Adapter-L | Mask2Former | COCO-Stuff-164k | 80k | 896 | [59.4](https://drive.google.com/file/d/1B_1XSwdnLhjJeUmn1g_nxfvGJpYmYWHa/view?usp=sharing) | [60.5](https://drive.google.com/file/d/1UtjmgcYKR-2h116oQXklUYOVcTw15woM/view?usp=sharing) | 571M |

Cityscapes val/test
**Cityscapes val/test**

| Method | Framework | Pre-train | Iters | Crop Size | val mIoU | val/test +MS | #Param |
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------:|
| ViT-Adapter-L | Mask2Former | Mapillary | 80k | 896 | [84.9](https://drive.google.com/file/d/1LKy0zz-brCBbKGmUWquadILaBHdDLR6s/view?usp=sharing) | [85.8](https://drive.google.com/file/d/1LSJvK1BPSbzm9eWpKL8Xo7RmYBrd2xux/view?usp=sharing)/[85.2](https://www.cityscapes-dataset.com/anonymous-results/?id=0ca6821dc3183ff970bd5266f812df2eaa4519ecb1973ca1308d65a3b546bf27) | 571M |

COCO-Stuff-10K
**COCO-Stuff-10K**

| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
| ViT-Adapter-L | UperNet | BEiT | 80k | 512 | [51.0](https://drive.google.com/file/d/1xZodiAvOLGaLtMGx_btYVZIMC2VKrDhI/view?usp=sharing) | [51.4](https://drive.google.com/file/d/1bmFG9GA4bRqOEJfqXcO7nWYPwG3wSk2J/view?usp=sharing) | 451M |
| ViT-Adapter-L | Mask2Former | BEiT | 40k | 512 | [53.2](https://drive.google.com/file/d/1Buewc1n7GBAcBDXeia-QarujrDZqc_Sx/view?usp=sharing) | [54.2](https://drive.google.com/file/d/1kQgJUHDeQoO3pPY6QoXRKwyF7heT7wCJ/view?usp=sharing) | 568M |

Pascal Context
**Pascal Context**

| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
Expand All @@ -68,7 +105,7 @@ Pascal Context

### COCO mini-val

Baseline Detectors
**Baseline Detectors**

| Method | Framework | Pre-train | Lr schd | Aug | box AP | mask AP | #Param |
|:-------------:|:----------:|:---------:|:-------:|:---:|:------:|:-------:|:------:|
Expand All @@ -77,7 +114,7 @@ Baseline Detectors
| ViT-Adapter-B | Mask R-CNN | DeiT | 3x | Yes | 49.6 | 43.6 | 120M |
| ViT-Adapter-L | Mask R-CNN | AugReg | 3x | Yes | 50.9 | 44.8 | 348M |

Advanced Detectors
**Advanced Detectors**

| Method | Framework | Pre-train | Lr schd | Aug | box AP | mask AP | #Param |
|:-------------:|:-------------------:|:---------:|:-------:|:---:|:------:|:-------:|:------:|
Expand All @@ -88,7 +125,7 @@ Advanced Detectors
| ViT-Adapter-B | Upgraded Mask R-CNN | MAE | 25ep | LSJ | 50.3 | 44.7 | 122M |
| ViT-Adapter-B | Upgraded Mask R-CNN | MAE | 50ep | LSJ | 50.8 | 45.1 | 122M |

ADE20K val
**ADE20K val**

| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
|:-------------:|:---------:|:---------:|:-----:|:---------:|:----:|:----:|:------:|
Expand Down
Loading

0 comments on commit ea89f6b

Please sign in to comment.