Skip to content

Latest commit

 

History

History
54 lines (43 loc) · 1.21 KB

README.md

File metadata and controls

54 lines (43 loc) · 1.21 KB

Instruction for generating data for AVSS

Before run this script, you need to install dlib first.

You can follow this blog to install.

This script is only for LRS3 dataset. You need to rewrite something for other datasets.

you need to modify path to your own path before running.

Step 1 Audio process

  1. extract audio
cd audio_process 
./extract_audio.sh
  1. (optional) cut audio of some speakers to 4~6 s
./audio_cut.sh 
  1. generate text file of mixed audio
python audio_path.py
  1. mix audio. This code refers to Deep Clustering
/opt18/matlab_2015b/bin/matlab -nosplash -nodesktop -r create_wav_2speakers

Step 2 Video process

  1. extract frames
cd video_process 
./extract_frames.sh
  1. generate file_path_of_frame and save it to LRS3_image.scp
by yourself
  1. detect and crop face + lip regions
python extrac_face_and_lip.py
  1. convert image sequences to npy
python convert_npy.py 

Thanks for KaiLi, This script is based on his repo