MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. In this repo we provide a working Dockerfile, and python scripts to process videos for action recognition using the the Action Recognition Models, and the Spatio Temporal Action Detection Models. We have performed experiments on 2 datasets- PoseML (RGB videos of drivers) and SHRP2 (Low-quality videos of drivers)
The files required to test an mmaction2 model are : checkpoint(s) (.pth
), config_file (.py
) and classes_file(.txt
).
For details about the method and quantitative results please check the MMAction2 documentation at https://mmaction2.readthedocs.io/en/latest/
Sign in to the Container registry service at ghcr.io
docker pull ghcr.io/akashsonth/action-recognition:latest
docker run -it --rm --runtime=nvidia -v {{dataPath}}:/data action-recognition /bin/bash
NOTE: this has been tested on a Ubuntu 18.04.6 machine, with a Tesla V100-SXM2-16GB GPU, with docker, nvidia-docker installed, and all relevant drivers.
We use in Dockerfile nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04 as base image and recommend using the same.
git clone https://github.com/VTTI/driver-secondary-action-recognition.git
cd driver-secondary-action-recognition
In the file poseml_long_video.yaml
, replace the value of the parameters- configFile
, checkpoint
, and label
with the required model parameters. We provide 3 trained models, and have provided instructions for them below. You can also make use of the different options from https://mmaction2.readthedocs.io/en/latest/recognition_models.html
Create a checkpoints
folder and download the chosen model checkpoint (options and instructions provided below) into this location.
docker build . -t action-recognition
docker run -it --rm --runtime=nvidia -v {{dataPath}}:/data action-recognition /bin/bash
( replace {{dataPath}} with the local folder on your computer containing [input folder] and where the outuput is expected to be stored)
python demo_long_video.py --input INPUT_VIDEO_PATH -- config poseml_long_video.yaml --output OUTPUT_VIDEO_PATH
Ex:
python demo_long_video.py --input ./sample/input/input.mp4 --config poseml_long_video.yaml --device cuda:0 --output ./sample/output/long_video.mp4
The initial few frames are required for instantiating the model, and there are no predictions till then.
frame_no | detection | label | confidence | x_min | y_min | x_max | y_max |
---|---|---|---|---|---|---|---|
40 | 0 | texting | 0.56 | ||||
40 | 1 | driving car | 0.23 | ||||
40 | 1 | changing oil | 0.07 | ||||
. | . | . | . | ||||
. | . | . | . |
Currently this repo supports three Action Recognition Models-
This is the MMAction2 implementation of Temporal segment networks: Towards good practices for deep action recognition
Value of configFile
in poseml_long_video.yaml
for this case is configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py
This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth. Use this checkpoint only if training from scratch.
To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tsn_PoseML_epoch20.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_poseml.txt"
To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch10.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_shrp2.txt"
Metrics on PoseML- Top 1 Accuracy: 76.19%, Top 3 Accuracy: 88.54%
This is the MMAction2 implementation of SlowFast Networks for Video Recognition
Value of configFile
in poseml_long_video.yaml
for this case is configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py
This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/slowfast_r50_256p_4x16x1_256e_kinetics400_rgb_20200728-145f1097.pth. Use this checkpoint only if training from scratch.
To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/slowfast_PoseML6sec_epoch65.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_poseml.txt"
To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch95.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_shrp2.txt"
Metrics on PoseML- Top 1 Accuracy: 71.48%, Top 3 Accuracy: 87.97%
This is the MMAction2 implementation of TAM: Temporal Adaptive Module for Video Recognition
Value of configFile
in poseml_long_video.yaml
for this case is configs/recognition/tanet/tanet_r50_dense_1x1x8_100e_kinetics400_rgb.py
This is the model pre-trained on Kinetics-400- https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_r50_dense_1x1x8_100e_kinetics400_rgb_20210219-032c8e94.pth. Use this checkpoint only if training from scratch.
To generate prediction for your videos based on our model trained on PoseML, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_PoseML6sec_epoch35.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_poseml.txt"
To generate prediction for your videos based on our model trained on SHRP2, download https://mirror.vtti.vt.edu/vtti/ctbs/action_recognition/tanet_SHRP2_epoch30.pth and move it ./checkpoints
. Also, edit the value of checkpoints
in poseml_long_video.yaml
.
Change the value of label
in poseml_long_video.yaml
to "label_shrp2.txt"
Metrics on PoseML- Top 1 Accuracy: 80.41%, Top 3 Accuracy: 90.72%
Firsly, prepare a folder train
containing all the video files to be used for training. Create an empty text file train.txt
. In each line of this text file, you wll have the video name, followed by a space, followed by its class index. Perform a similar action for the validation dataset (val
video directory and val.txt
text file)
Ex-
VID00031_0001.mp4 1
VID00031_0002.mp4 8
VID00031_0003.mp4 8
. .
. .
In the Docker container, execute the command python train.py CONFIG_FILE
Make the following changes in the train.py
file-
- Edit
cfg.model.cls_head.num_classes = 10
to the number of classes in your dataset - Modify the path
cfg.work_dir
to your required folder where all the model weights will be saved - Modify the paths of train videos, val videos, and their corresponding text files