-
Notifications
You must be signed in to change notification settings - Fork 124
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feature] add a frontend module in wespeaker and support wavlm (#344)
* [feature] add a frontend module in wespeaker and support wavlm * update .gitignore * update wavlm configs * update wespeaker/frontend/__init__.py * [fix] remove trailing whitespaces * [fix] fix lint errors * [fix] fix lint errors * [fix] fix lint errors * [fix] fix spelling mistakes * update run.sh * update wavlm configs and add run_wavlm.sh * update README.md
- Loading branch information
Showing
21 changed files
with
738 additions
and
52 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,3 +42,4 @@ tensorboard | |
*.onnx | ||
external_tools | ||
pretrained_models | ||
s3prl_hub |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
### train configuraton | ||
|
||
exp_dir: exp/ECAPA_TDNN_GLOB_c512-ASTP-emb192-WavLM_Large_frozen-num_frms150-aug0.6-spTrue-saFalse-ArcMargin_intertopk_subcenter-SGD-epoch150 | ||
gpus: "[0,1,2,3,4,5,6,7]" | ||
num_avg: 10 | ||
enable_amp: True # whether enable automatic mixed precision training | ||
|
||
seed: 42 | ||
num_epochs: 150 | ||
save_epoch_interval: 5 # save model every 5 epochs | ||
log_batch_interval: 100 # log every 100 batchs | ||
|
||
dataloader_args: | ||
batch_size: 256 | ||
num_workers: 16 | ||
pin_memory: False | ||
prefetch_factor: 16 | ||
drop_last: True | ||
|
||
dataset_args: | ||
# the sample number which will be traversed within one epoch, if the value equals to 0, | ||
# the utterance number in the dataset will be used as the sample_num_per_epoch. | ||
sample_num_per_epoch: 0 | ||
shuffle: True | ||
shuffle_args: | ||
shuffle_size: 2500 | ||
filter: True | ||
filter_args: | ||
min_num_frames: 50 | ||
max_num_frames: 400 | ||
resample_rate: 16000 | ||
speed_perturb: True | ||
num_frms: 150 | ||
aug_prob: 0.6 # prob to add reverb & noise aug per sample | ||
frontend: "s3prl" # fbank, s3prl | ||
s3prl_args: | ||
upstream_args: | ||
name: "wavlm_large" | ||
download_dir: ./s3prl_hub | ||
multilayer_feature: True | ||
layer: -1 | ||
frozen: True | ||
frame_shift: 20 | ||
frame_length: 20 | ||
cmvn: True | ||
cmvn_args: | ||
norm_mean: True | ||
norm_var: False | ||
spec_aug: False | ||
spec_aug_args: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 10 | ||
max_f: 8 | ||
prob: 0.6 | ||
|
||
model: ECAPA_TDNN_GLOB_c512 # ECAPA_TDNN_GLOB_c512, ECAPA_TDNN_GLOB_c1024 | ||
model_init: null | ||
model_args: | ||
feat_dim: -1 # equals to the output_size of the frontend (will be initialized before training) | ||
embed_dim: 192 | ||
pooling_func: "ASTP" # the default pooling_func in ECAPA_TDNN is ASTP | ||
projection_args: | ||
project_type: "arc_margin_intertopk_subcenter" # add_margin, arc_margin, sphere, softmax, arc_margin_intertopk_subcenter | ||
scale: 32.0 | ||
easy_margin: False | ||
|
||
margin_scheduler: MarginScheduler | ||
margin_update: | ||
initial_margin: 0.0 | ||
final_margin: 0.2 | ||
increase_start_epoch: 20 | ||
fix_start_epoch: 40 | ||
update_margin: True | ||
increase_type: "exp" # exp, linear | ||
|
||
loss: CrossEntropyLoss | ||
loss_args: {} | ||
|
||
optimizer: SGD | ||
optimizer_args: | ||
momentum: 0.9 | ||
nesterov: True | ||
weight_decay: 0.0001 | ||
|
||
scheduler: ExponentialDecrease | ||
scheduler_args: | ||
initial_lr: 0.1 | ||
final_lr: 0.00001 | ||
warm_up_epoch: 6 | ||
warm_from_zero: True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
### train configuraton | ||
|
||
exp_dir: exp/ECAPA_TDNN_GLOB_c512-ASTP-emb192-WavLM_Large_joint_ft-num_frms150-aug0.6-spTrue-saFalse-ArcMargin_intertopk_subcenter-SGD-epoch20 | ||
gpus: "[0,1,2,3,4,5,6,7]" | ||
num_avg: 3 | ||
enable_amp: True # whether enable automatic mixed precision training | ||
do_lm: False | ||
|
||
seed: 42 | ||
num_epochs: 20 | ||
save_epoch_interval: 1 # save model every epoch | ||
log_batch_interval: 100 # log every 100 batchs | ||
|
||
dataloader_args: | ||
batch_size: 64 | ||
num_workers: 8 | ||
pin_memory: False | ||
prefetch_factor: 8 | ||
drop_last: True | ||
|
||
dataset_args: | ||
# the sample number which will be traversed within one epoch, if the value equals to 0, | ||
# the utterance number in the dataset will be used as the sample_num_per_epoch. | ||
sample_num_per_epoch: 0 | ||
shuffle: True | ||
shuffle_args: | ||
shuffle_size: 2500 | ||
filter: True | ||
filter_args: | ||
min_num_frames: 50 | ||
max_num_frames: 400 | ||
resample_rate: 16000 | ||
speed_perturb: True | ||
num_frms: 150 | ||
aug_prob: 0.6 # prob to add reverb & noise aug per sample | ||
frontend: "s3prl" # fbank, s3prl | ||
s3prl_args: | ||
upstream_args: | ||
name: "wavlm_large" | ||
download_dir: ./s3prl_hub | ||
multilayer_feature: True | ||
layer: -1 | ||
frozen: False | ||
frame_shift: 20 | ||
frame_length: 20 | ||
cmvn: True | ||
cmvn_args: | ||
norm_mean: True | ||
norm_var: False | ||
spec_aug: False | ||
spec_aug_args: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 10 | ||
max_f: 8 | ||
prob: 0.6 | ||
|
||
model: ECAPA_TDNN_GLOB_c512 # ECAPA_TDNN_GLOB_c512, ECAPA_TDNN_GLOB_c1024 | ||
model_init: null | ||
model_args: | ||
feat_dim: -1 # equals to the output_size of the frontend (will be initialized before training) | ||
embed_dim: 192 | ||
pooling_func: "ASTP" # the default pooling_func in ECAPA_TDNN is ASTP | ||
projection_args: | ||
project_type: "arc_margin_intertopk_subcenter" # add_margin, arc_margin, sphere, softmax, arc_margin_intertopk_subcenter | ||
scale: 32.0 | ||
easy_margin: False | ||
|
||
margin_scheduler: MarginScheduler | ||
margin_update: | ||
initial_margin: 0.2 | ||
final_margin: 0.2 | ||
increase_start_epoch: 1 | ||
fix_start_epoch: 1 | ||
update_margin: True | ||
increase_type: "exp" # exp, linear | ||
|
||
loss: CrossEntropyLoss | ||
loss_args: {} | ||
|
||
optimizer: SGD | ||
optimizer_args: | ||
momentum: 0.9 | ||
nesterov: True | ||
weight_decay: 0.0001 | ||
|
||
scheduler: ExponentialDecrease | ||
scheduler_args: | ||
initial_lr: 0.001 | ||
final_lr: 0.00025 | ||
warm_up_epoch: 1 | ||
warm_from_zero: True |
92 changes: 92 additions & 0 deletions
92
examples/voxceleb/v2/conf/ecapa_tdnn_WavLM_joint_lmft.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
### train configuraton | ||
|
||
exp_dir: exp/ECAPA_TDNN_GLOB_c512-ASTP-emb192-WavLM_Large_joint_lmft-num_frms300-aug0.6-spTrue-saFalse-ArcMargin_intertopk_subcenter-SGD-epoch10 | ||
gpus: "[0,1,2,3,4,5,6,7]" | ||
num_avg: 1 | ||
enable_amp: True # whether enable automatic mixed precision training | ||
do_lm: True | ||
|
||
seed: 42 | ||
num_epochs: 20 | ||
save_epoch_interval: 1 # save model every epoch | ||
log_batch_interval: 100 # log every 100 batchs | ||
|
||
dataloader_args: | ||
batch_size: 32 | ||
num_workers: 8 | ||
pin_memory: False | ||
prefetch_factor: 8 | ||
drop_last: True | ||
|
||
dataset_args: | ||
# the sample number which will be traversed within one epoch, if the value equals to 0, | ||
# the utterance number in the dataset will be used as the sample_num_per_epoch. | ||
sample_num_per_epoch: 0 | ||
shuffle: True | ||
shuffle_args: | ||
shuffle_size: 2500 | ||
filter: True | ||
filter_args: | ||
min_num_frames: 50 | ||
max_num_frames: 400 | ||
resample_rate: 16000 | ||
speed_perturb: True | ||
num_frms: 300 | ||
aug_prob: 0.6 # prob to add reverb & noise aug per sample | ||
frontend: "s3prl" # fbank, s3prl | ||
s3prl_args: | ||
upstream_args: | ||
name: "wavlm_large" | ||
download_dir: ./s3prl_hub | ||
multilayer_feature: True | ||
layer: -1 | ||
frozen: False | ||
frame_shift: 20 | ||
frame_length: 20 | ||
cmvn: True | ||
cmvn_args: | ||
norm_mean: True | ||
norm_var: False | ||
spec_aug: False | ||
spec_aug_args: | ||
num_t_mask: 1 | ||
num_f_mask: 1 | ||
max_t: 10 | ||
max_f: 8 | ||
prob: 0.6 | ||
|
||
model: ECAPA_TDNN_GLOB_c512 # ECAPA_TDNN_GLOB_c512, ECAPA_TDNN_GLOB_c1024 | ||
model_init: null | ||
model_args: | ||
feat_dim: -1 # equals to the output_size of the frontend (will be initialized before training) | ||
embed_dim: 192 | ||
pooling_func: "ASTP" # the default pooling_func in ECAPA_TDNN is ASTP | ||
projection_args: | ||
project_type: "arc_margin_intertopk_subcenter" # add_margin, arc_margin, sphere, softmax, arc_margin_intertopk_subcenter | ||
scale: 32.0 | ||
easy_margin: False | ||
|
||
margin_scheduler: MarginScheduler | ||
margin_update: | ||
initial_margin: 0.5 | ||
final_margin: 0.5 | ||
increase_start_epoch: 1 | ||
fix_start_epoch: 1 | ||
update_margin: True | ||
increase_type: "exp" # exp, linear | ||
|
||
loss: CrossEntropyLoss | ||
loss_args: {} | ||
|
||
optimizer: SGD | ||
optimizer_args: | ||
momentum: 0.9 | ||
nesterov: True | ||
weight_decay: 0.0001 | ||
|
||
scheduler: ExponentialDecrease | ||
scheduler_args: | ||
initial_lr: 0.0001 | ||
final_lr: 0.000025 | ||
warm_up_epoch: 1 | ||
warm_from_zero: True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.