Skip to content

Latest commit

 

History

History
138 lines (84 loc) · 5.77 KB

File metadata and controls

138 lines (84 loc) · 5.77 KB

简体中文 | English

YOWO

Content

Introduction

YOWO is a single-stage network with two branches. One branch extracts spatial features of key frames (i.e., the current frame) via 2D-CNN, while the other branch acquires spatio-temporal features of clips consisting of previous frames via 3D-CNN. To accurately aggregate these features, YOWO uses a channel fusion and attention mechanism that maximizes the inter-channel dependencies. Finally, the fused features are subjected to frame-level detection.

Data

UCF101-24 data download and preparation please refer to UCF101-24 data preparation

Train

UCF101-24 data set training

Download and add pre-trained models

  1. Download the pre-training model resnext-101-kineticsdarknet as Backbone initialization parameters, or download through the wget command

     wget -nc https://videotag.bj.bcebos.com/PaddleVideo-release2.3/darknet.pdparam
     wget -nc https://videotag.bj.bcebos.com/PaddleVideo-release2.3/resnext101_kinetics.pdparams
  2. Open PaddleVideo/configs/localization/yowo.yaml, and fill in the downloaded weight storage path below pretrained_2d: and pretrained_3d: respectively

    MODEL:
        framework: "YOWOLocalizer"
        backbone:
            name: "YOWO"
            num_class: 24
            pretrained_2d: fill in the path of 2D pre-training model here
            pretrained_3d: fill in the path of 3D pre-training model here

Start training

  • The UCF101-24 data set uses 1 card for training, and the start command of the training method is as follows:

    python3 main.py -c configs/localization/yowo.yaml --validate --seed=1
  • Turn on amp mixed-precision training to speed up the training process. The training start command is as follows:

    python3 main.py --amp -c configs/localization/yowo.yaml --validate --seed=1
  • In addition, you can customize and modify the parameter configuration to achieve the purpose of training/testing on different data sets. It is recommended that the naming method of the configuration file is model_dataset name_file format_data format_sampling method.yaml , Please refer to config for parameter usage.

Test

  • The YOWO model is verified synchronously during training. You can find the keyword best in the training log to obtain the model test accuracy. The log example is as follows:

    Already save the best model (fsocre)0.8779
    
  • Since the verification index of the YOWO model test mode is Frame-mAP (@ IoU 0.5), which is different from the fscore used in the verification mode during the training process, so the verification index recorded in the training log, called fscore , does not represent the final test score, so after the training is completed, you can use the test mode to test the best model to obtain the final index, the command is as follows:

    python3 main.py -c configs/localization/yowo.yaml --test --seed=1 -w 'output/YOWO/YOWO_epoch_00005.pdparams'

    When the test configuration uses the following parameters, the test indicators on the validation data set of UCF101-24 are as follows:

    Model 3D-CNN backbone 2D-CNN backbone Dataset Input Frame-mAP
    (@ IoU 0.5)
    checkpoints
    YOWO 3D-ResNext-101 Darknet-19 UCF101-24 16-frames, d=1 80.94 YOWO.pdparams

Inference

Export inference model

python3 tools/export_model.py -c configs/localization/yowo.yaml -p 'output/YOWO/YOWO_epoch_00005.pdparams'

The above command will generate the model structure file YOWO.pdmodel and the model weight file YOWO.pdiparams required for prediction.

Use prediction engine inference

  • Download the test video HorseRiding.avi for a quick experience, or via the wget command. The downloaded video should be placed in the data/ucf24 directory:
wget -nc https://videotag.bj.bcebos.com/Data/HorseRiding.avi
  • Run the following command for inference:
python3 tools/predict.py -c configs/localization/yowo.yaml -i 'data/ucf24/HorseRiding.avi' --model_file ./inference/YOWO.pdmodel --params_file ./inference/YOWO.pdiparams
  • When inference is over, the prediction results in image form will be saved in the inference/YOWO_infer directory. The image sequence can be converted to a gif by running the following command to complete the final visualisation.
python3 data/ucf24/visualization.py --frames_dir ./inference/YOWO_infer/HorseRiding --duration 0.04

The resulting visualization is as follows:

Horse Riding

It can be seen that using the YOWO model trained on UCF101-24 to predict data/ucf24/HorseRiding.avi, the category of each frame output is HorseRiding with a confidence level of about 0.80.

Reference