Skip to content

Latest commit

 

History

History
61 lines (46 loc) · 2.88 KB

README.md

File metadata and controls

61 lines (46 loc) · 2.88 KB

M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Paper Conference Workshop Challenge

Description

This repository holds the PyTorch implementation of the approach described in our report "M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation", which is used for our entry to ABAW Challenge 2020 (VA track). We provide models trained on Aff-Wild2.

Update

  • 2020.02.10: Initial public release

How to run

First, install dependencies

# clone project   
git clone https://github.com/sailordiary/m3t.pytorch
python3 -m pip install -r requirements.txt --user

To evaluate on our pretrained models, first download the checkpoints from the release page, and run eval.py to generate validation or test set predictions:

# download the checkpoint
wget 
# to report CCC on the validation set
python3 eval.py --test_on_val --checkpoint m3t_mtl-vox2.pt
python3 get_smoothed_ccc predictions_val.pt
# to generate test set predictions
python3 eval.py --checkpoint m3t_mtl-vox2.pt

Dataset

We use the Aff-Wild2 dataset. The raw videos are decoded with ffmpeg, and passed to RetinaFace-ResNet50 for face detection. To extract log-Mel spectrogram energies, extract 16kHz mono wave files from audio tracks, and refer to process/extract_melspec.py.

We provide the cropped-aligned face tracks (256x256, ~79G zipped) as well as pre-computed SENet-101 and TCAE features we use for our experiments here: [OneDrive]

Some files are still being uploaded at this moment. Please check the page again later.

Note that in addition to the 256-dimensional encoder features, we also saved 12 AU activation scores predicted by TCAE, which together are concatenated into a 268-dimensional vector for each video frame. We only used the encoder features for our experiments, but feel free to experiment with this extra information.

Model Zoo

Coming soon...

Citation

@misc{zhang2020m3t,
    title={$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild},
    author={Yuan-Hang Zhang and Rulin Huang and Jiabei Zeng and Shiguang Shan and Xilin Chen},
    year={2020},
    eprint={2002.02957},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}