YOWO

Content

Introduction
Data
Train
Test
Inference
Reference

Introduction

YOWO is a single-stage network with two branches. One branch extracts spatial features of key frames (i.e., the current frame) via 2D-CNN, while the other branch acquires spatio-temporal features of clips consisting of previous frames via 3D-CNN. To accurately aggregate these features, YOWO uses a channel fusion and attention mechanism that maximizes the inter-channel dependencies. Finally, the fused features are subjected to frame-level detection.

Data

UCF101-24 data download and preparation please refer to UCF101-24 data preparation

Train

UCF101-24 data set training

Download and add pre-trained models

Download the pre-training model resnext-101-kinetics 和 darknet as Backbone initialization parameters, or download through the wget command

 wget -nc https://videotag.bj.bcebos.com/PaddleVideo-release2.3/darknet.pdparam
 wget -nc https://videotag.bj.bcebos.com/PaddleVideo-release2.3/resnext101_kinetics.pdparams

Open PaddleVideo/configs/localization/yowo.yaml, and fill in the downloaded weight storage path below pretrained_2d: and pretrained_3d: respectively

MODEL:
    framework: "YOWOLocalizer"
    backbone:
        name: "YOWO"
        num_class: 24
        pretrained_2d: fill in the path of 2D pre-training model here
        pretrained_3d: fill in the path of 3D pre-training model here

Start training

The UCF101-24 data set uses 1 card for training, and the start command of the training method is as follows:
```
python3 main.py -c configs/localization/yowo.yaml --validate --seed=1
```
Turn on amp mixed-precision training to speed up the training process. The training start command is as follows:
```
python3 main.py --amp -c configs/localization/yowo.yaml --validate --seed=1
```
In addition, you can customize and modify the parameter configuration to achieve the purpose of training/testing on different data sets. It is recommended that the naming method of the configuration file is model_dataset name_file format_data format_sampling method.yaml , Please refer to config for parameter usage.

Test

The YOWO model is verified synchronously during training. You can find the keyword best in the training log to obtain the model test accuracy. The log example is as follows:
```
Already save the best model (fsocre)0.8779
```
Since the verification index of the YOWO model test mode is Frame-mAP (@ IoU 0.5), which is different from the fscore used in the verification mode during the training process, so the verification index recorded in the training log, called fscore , does not represent the final test score, so after the training is completed, you can use the test mode to test the best model to obtain the final index, the command is as follows:
```
python3 main.py -c configs/localization/yowo.yaml --test --seed=1 -w 'output/YOWO/YOWO_epoch_00005.pdparams'
```
When the test configuration uses the following parameters, the test indicators on the validation data set of UCF101-24 are as follows:

Model 3D-CNN backbone 2D-CNN backbone Dataset Input Frame-mAP
(@ IoU 0.5) checkpoints

YOWO 3D-ResNext-101 Darknet-19 UCF101-24 16-frames, d=1 80.94 YOWO.pdparams

Inference

Export inference model

python3 tools/export_model.py -c configs/localization/yowo.yaml -p 'output/YOWO/YOWO_epoch_00005.pdparams'

The above command will generate the model structure file YOWO.pdmodel and the model weight file YOWO.pdiparams required for prediction.

For the meaning of each parameter, please refer to Model Reasoning Method

Use prediction engine inference

Download the test video HorseRiding.avi for a quick experience, or via the wget command. The downloaded video should be placed in the data/ucf24 directory:

wget -nc https://videotag.bj.bcebos.com/Data/HorseRiding.avi

Run the following command for inference:

python3 tools/predict.py -c configs/localization/yowo.yaml -i 'data/ucf24/HorseRiding.avi' --model_file ./inference/YOWO.pdmodel --params_file ./inference/YOWO.pdiparams

When inference is over, the prediction results in image form will be saved in the inference/YOWO_infer directory. The image sequence can be converted to a gif by running the following command to complete the final visualisation.

python3 data/ucf24/visualization.py --frames_dir ./inference/YOWO_infer/HorseRiding --duration 0.04

The resulting visualization is as follows:

It can be seen that using the YOWO model trained on UCF101-24 to predict data/ucf24/HorseRiding.avi, the category of each frame output is HorseRiding with a confidence level of about 0.80.

Reference

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization, Köpüklü O, Wei X, Rigoll G.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yowo.md

yowo.md

YOWO

Content

Introduction

Data

Train

UCF101-24 data set training

Download and add pre-trained models

Start training

Test

Inference

Export inference model

Use prediction engine inference

Reference

Files

yowo.md

Latest commit

History

yowo.md

File metadata and controls

YOWO

Content

Introduction

Data

Train

UCF101-24 data set training

Download and add pre-trained models

Start training

Test

Inference

Export inference model

Use prediction engine inference

Reference