Experiments on ActivityNet, FCVID and Mini-Kinetics

Requirements

python 3.8
pytorch 1.7.0
torchvision 0.8.0
hydra 1.1.0

Datasets

Please get the train/test split files for each dataset from Google Drive and put them in PATH_TO_DATASET.
Download videos from following links, or contact the corresponding authors for the access. Save them to PATH_TO_DATASET/videos

ActivityNet-v1.3
FCVID
Mini-Kinetics. Please download Kinetics 400. For Mini-Kinetics used in our paper, you need to use the train/val splits file from AR-Net.

Extract frames using ops/video_jpg.py, the frames will be saved to PATH_TO_DATASET/frames. Minor modifications on file path are needed when extracting frames from different dataset.

Pre-trained Models on ActivityNet

Please download pre-trained weights and checkpoints from Google Drive.

globalcnn.pth.tar: pre-trained weights for global CNN (MobileNet-v2).
localcnn.pth.tar: pre-trained weights for local CNN (ResNet-50).
128checkpoint.pth.tar: checkpoint of stage 1 with patch size 128x128.
160checkpoint.pth.tar: checkpoint of stage 1 with patch size 160x160.
192checkpoint.pth.tar: checkpoint of stage 1 with patch size 192x192.
128s3_checkpoint.pth.tar: checkpoint to reproduce the result in paper with patch size 128x128.
160s3_checkpoint.pth.tar: checkpoint to reproduce the result in paper with patch size 160x160.
192s3_checkpoint.pth.tar: checkpoint to reproduce the result in paper with patch size 192x192.

Training

Here we take training the model with patch size 128x128 on ActivityNet dataset for example.
All logs and checkpoints will be saved in the directory: ./outputs/YYYY-MM-DD/HH-MM-SS
Note that we store a set of default hyper-parameters in conf/default.yaml which can be overrided through command line. You can also use your own config files.
Before training, please initialize Global CNN and Local CNN by fine-tuning the ImageNet pre-trained models in Pytorch using the following command:

for Global CNN:

CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=0 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.01 epochs=15 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrain_glancer=true

for Local CNN:

CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=0 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.01 epochs=15 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrain_glancer=false

Training stage 1, pre-trained weights for Global CNN and Local CNN are required:

CUDA_VISIBLE_DEVICES=0,1 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=1 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.05 epochs=50 dist_url=tcp://127.0.0.1:8857 random_patch=true patch_size=128 glance_size=224 eval_freq=5 consensus=gru hidden_dim=1024 pretrained_glancer=PATH_TO_CHECKPOINTS pretrained_focuser=PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=2 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.05 epochs=50 random_patch=false patch_size=128 glance_size=224 action_dim=49 eval_freq=5 consensus=gru hidden_dim=1024 resume=PATH_TO_CHECKPOINTS multiprocessing_distributed=false distributed=false

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=3 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.005 epochs=10 random_patch=false patch_size=128 glance_size=224 action_dim=49 eval_freq=5 consensus=gru hidden_dim=1024 resume=PATH_TO_CHECKPOINTS multiprocessing_distributed=false distributed=false

Evaluate Pre-trained Models

Here we take evaluating model with patch size 128x128 on ActivityNet for example.

CUDA_VISIBLE_DEVICES=0 python main_dist.py dataset=actnet data_dir=PATH_TO_DATASET train_stage=3 batch_size=64 workers=8 dropout=0.8 lr_type=cos backbone_lr=0.0005 fc_lr=0.005 epochs=10 random_patch=false patch_size=128 glance_size=224 action_dim=49 eval_freq=5 consensus=gru hidden_dim=1024 resume=PATH_TO_CHECKPOINTS multiprocessing_distributed=false distributed=false evaluate=true

Acknowledgement

We use the implementation of MobileNet-v2 and ResNet from Pytorch source code. We also borrow some codes for dataset preparation from AR-Net and PPO from here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
basic_tools		basic_tools
conf		conf
models		models
modelsblock		modelsblock
mythop		mythop
ops		ops
outputs		outputs
README.md		README.md
USconfig.py		USconfig.py
prun_mobilenetv2.py		prun_mobilenetv2.py
test.py		test.py
video_demo.py		video_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiments on ActivityNet, FCVID and Mini-Kinetics

Requirements

Datasets

Pre-trained Models on ActivityNet

Training

Evaluate Pre-trained Models

Acknowledgement

About

Releases

Packages

Languages

lujuncong2000/-

Folders and files

Latest commit

History

Repository files navigation

Experiments on ActivityNet, FCVID and Mini-Kinetics

Requirements

Datasets

Pre-trained Models on ActivityNet

Training

Evaluate Pre-trained Models

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages