Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp2023-543] Adapt new version of Config #2699

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion configs/_base_/models/tin_r50.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# model settings

preprocess_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], format_shape='NCHW')
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
format_shape='NCHW')

model = dict(
type='Recognizer2D',
Expand Down
2 changes: 1 addition & 1 deletion configs/recognition/tin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ For a long time, the vision community tries to learn the spatio-temporal represe

| frame sampling strategy | resolution | gpus | backbone | pretrain | top1 acc | top5 acc | testing protocol | inference time(video/s) | gpu_mem(M) | config | ckpt | log |
| :---------------------: | :------------: | :--: | :------: | :-------------: | :------: | :------: | :--------------: | :---------------------: | :--------: | :-----------------------: | :---------------------: | :---------------------: |
| 1x1x8 | short-side 256 | 8x4 | ResNet50 | TSM-Kinetics400 | 71.77 | 90.36 | 8 clips x 1 crop | x | 6185 | [config](/configs/recognition/tin/tin_imagenet-pretrained-r50_8xb6-1x1x8-40e_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb_20220913-7f10d0c0.pth) | [log](https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb.log) |
| 1x1x8 | short-side 256 | 8x4 | ResNet50 | TSM-Kinetics400 | 71.86 | 90.44 | 8 clips x 1 crop | x | 6185 | [config](/configs/recognition/tin/tin_imagenet-pretrained-r50_8xb6-1x1x8-40e_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb_20220913-7f10d0c0.pth) | [log](https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb.log) |

Here, we use `finetune` to indicate that we use [TSM model](https://download.openmmlab.com/mmaction/v1.0/v1.0/recognition/tsm/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb_20220831-64d69186.pth) trained on Kinetics-400 to finetune the TIN model on Kinetics-400.

Expand Down
4 changes: 2 additions & 2 deletions configs/recognition/tin/metafile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ Models:
Results:
- Dataset: Kinetics-400
Metrics:
Top 1 Accuracy: 71.77
Top 5 Accuracy: 90.36
Top 1 Accuracy: 71.86
Top 5 Accuracy: 90.44
Task: Action Recognition
Training Log: https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb.log
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/tin/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb/tin_kinetics400-pretrained-tsm-r50_1x1x8-50e_kinetics400-rgb_20220913-7f10d0c0.pth
21 changes: 11 additions & 10 deletions dataset-index.yml
Original file line number Diff line number Diff line change
@@ -1,39 +1,40 @@
openxlab: true
kinetics400:
dataset: Kinetics-400
dataset: OpenMMLab/Kinetics-400
download_root: data
data_root: data/kinetics400
script: tools/data/kinetics/k400_preprocess.sh
script: tools/data/kinetics/preprocess_k400.sh

kinetics600:
dataset: Kinetics600
dataset: OpenMMLab/Kinetics600
download_root: data
data_root: data/kinetics600
script: tools/data/kinetics/k600_preprocess.sh
script: tools/data/kinetics/preprocess_k600.sh

kinetics700:
dataset: Kinetics_700
dataset: OpenMMLab/Kinetics_700
download_root: data
data_root: data/kinetics700
script: tools/data/kinetics/k700_preprocess.sh
script: tools/data/kinetics/preprocess_k700.sh

sthv2:
dataset: sthv2
dataset: OpenDataLab/sthv2
download_root: data
data_root: data/sthv2
script: tools/data/sthv2/preprocess.sh

ucf-101:
dataset: UCF101
dataset: OpenDataLab/UCF101
download_root: data
data_root: data/ucf101

finegym:
dataset: FineGym
dataset: OpenDataLab/FineGym
download_root: data
data_root: data/gym

diving48:
dataset: diving48
dataset: OpenDataLab/diving48
download_root: data
data_root: data/diving48
script: tools/data/diving48/preprocess.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Copyright (c) OpenMMLab. All rights reserved.
from mmengine.config import read_base

with read_base():
from ..._base_.default_runtime import *

from mmengine.dataset import DefaultSampler
from mmengine.optim import CosineAnnealingLR, LinearLR
from mmengine.runner import EpochBasedTrainLoop, TestLoop, ValLoop
from torch.optim import AdamW

from mmaction.datasets import (CenterCrop, DecordDecode, DecordInit, Flip,
FormatShape, PackActionInputs,
PytorchVideoWrapper, RandomResizedCrop, Resize,
ThreeCrop, UniformSample, VideoDataset)
from mmaction.evaluation import AccMetric
from mmaction.models import (ActionDataPreprocessor, Recognizer3D,
TimeSformerHead, UniFormerHead, UniFormerV2)

# model settings
num_frames = 8
model = dict(
type=Recognizer3D,
backbone=dict(
type=UniFormerV2,
input_resolution=224,
patch_size=16,
width=768,
layers=12,
heads=12,
t_size=num_frames,
dw_reduction=1.5,
backbone_drop_path_rate=0.,
temporal_downsample=False,
no_lmhra=True,
double_lmhra=True,
return_list=[8, 9, 10, 11],
n_layers=4,
n_dim=768,
n_head=12,
mlp_factor=4.,
drop_path_rate=0.,
mlp_dropout=[0.5, 0.5, 0.5, 0.5],
clip_pretrained=False,
init_cfg=dict(
type='Pretrained',
checkpoint= # noqa: E251
'https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/kinetics400/uniformerv2-base-p16-res224_clip-kinetics710-pre_u8_kinetics400-rgb_20221219-203d6aac.pth', # noqa: E501
prefix='backbone.')),
cls_head=dict(
type=TimeSformerHead,
dropout_ratio=0.5,
num_classes=339,
in_channels=768,
average_clips='prob'),
data_preprocessor=dict(
type=ActionDataPreprocessor,
mean=[114.75, 114.75, 114.75],
std=[57.375, 57.375, 57.375],
format_shape='NCTHW'))

# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/mit/videos/training'
data_root_val = 'data/mit/videos/validation'
ann_file_train = 'data/mit/mit_train_list_videos.txt'
ann_file_val = 'data/mit/mit_val_list_videos.txt'
ann_file_test = 'data/mit/mit_val_list_videos.txt'

file_client_args = dict(io_backend='disk')
train_pipeline = [
dict(type=DecordInit, **file_client_args),
dict(type=UniformSample, clip_len=num_frames, num_clips=1),
dict(type=DecordDecode),
dict(type=Resize, scale=(-1, 256)),
dict(
type=PytorchVideoWrapper, op='RandAugment', magnitude=7, num_layers=4),
dict(type=RandomResizedCrop),
dict(type=Resize, scale=(224, 224), keep_ratio=False),
dict(type=Flip, flip_ratio=0.5),
dict(type=FormatShape, input_format='NCTHW'),
dict(type=PackActionInputs)
]

val_pipeline = [
dict(type=DecordInit, **file_client_args),
dict(type=UniformSample, clip_len=num_frames, num_clips=1, test_mode=True),
dict(type=DecordDecode),
dict(type=Resize, scale=(-1, 224)),
dict(type=CenterCrop, crop_size=224),
dict(type=FormatShape, input_format='NCTHW'),
dict(type=PackActionInputs)
]

test_pipeline = [
dict(type=DecordInit, **file_client_args),
dict(type=UniformSample, clip_len=num_frames, num_clips=4, test_mode=True),
dict(type=DecordDecode),
dict(type=Resize, scale=(-1, 224)),
dict(type=ThreeCrop, crop_size=224),
dict(type=FormatShape, input_format='NCTHW'),
dict(type=PackActionInputs)
]

train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type=DefaultSampler, shuffle=True),
dataset=dict(
type=VideoDataset,
ann_file=ann_file_train,
data_prefix=dict(video=data_root),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type=DefaultSampler, shuffle=False),
dataset=dict(
type=VideoDataset,
ann_file=ann_file_val,
data_prefix=dict(video=data_root_val),
pipeline=val_pipeline,
test_mode=True))
test_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type=DefaultSampler, shuffle=False),
dataset=dict(
type=VideoDataset,
ann_file=ann_file_test,
data_prefix=dict(video=data_root_val),
pipeline=test_pipeline,
test_mode=True))

val_evaluator = dict(type=AccMetric)
test_evaluator = dict(type=AccMetric)
train_cfg = dict(
type=EpochBasedTrainLoop, max_epochs=24, val_begin=1, val_interval=1)
val_cfg = dict(type=ValLoop)
test_cfg = dict(type=TestLoop)

base_lr = 2e-5
optim_wrapper = dict(
optimizer=dict(
type=AdamW, lr=base_lr, betas=(0.9, 0.999), weight_decay=0.05),
paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0),
clip_grad=dict(max_norm=20, norm_type=2))

param_scheduler = [
dict(
type=LinearLR,
start_factor=1 / 20,
by_epoch=True,
begin=0,
end=5,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min_ratio=1 / 20,
by_epoch=True,
begin=5,
end=24,
convert_to_iter_based=True)
]

default_hooks.update(
dict(
checkpoint=dict(interval=3, max_keep_ckpts=5),
logger=dict(interval=100)))

# Default setting for scaling LR automatically
# - `enable` means enable scaling LR automatically
# or not by default.
# - `base_batch_size` = (8 GPUs) x (8 samples per GPU).
auto_scale_lr = dict(enable=True, base_batch_size=512)
Loading