Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 983 Bytes

README.md

File metadata and controls

32 lines (20 loc) · 983 Bytes

LSTM-RNN Voice Activity Detection

REQUIRED PACKAGES

numpy, tensorflow, libROSA, matplotlib

FILES

- dataset_utils.py
Dataset related utilities: One-hot encoding, wav file normalisation, TRS to CSV conversion, JSON to CSV conversion, Youtube wav download for the AudioSet Google corpus, Liblinear library data transformations

- metrics_utils.py
(NOT FINALISED) Metrics' related utilities for the baseline VAD methods

- feature_extractor.py
Feature extraction class to extract MFCC, deltas, double deltas, RSE

- VAD_model.py
LSTM-RNN tensorflow learning model

- _main_.py
The program's main entry point

- /checkpoint
Tensorflow checkpoint directory for saving and restoring learning models

- /parameter
LSTM-RNN learning model hyper-parameters, training parameters, and log/checkpoint directories names

- /notebook
Jupyter notebooks to test initial VAD prototypes