Skip to content

Commit

Permalink
Merge pull request #357 from wenet-e2e/hongji-fix
Browse files Browse the repository at this point in the history
[recipe] fix errors in voxceleb/v1/Whisper-PMFA
  • Loading branch information
Aurora1818 authored Aug 31, 2024
2 parents d5f6097 + 96eb76b commit 53e5ad3
Show file tree
Hide file tree
Showing 10 changed files with 21 additions and 21 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ pre-commit install # for clean and tidy code
```

## 🔥 News
* 2024.08.30: We support whisper_encoder based frontend and propose the [Whisper-PMFA](https://arxiv.org/pdf/2408.15585) framework, check [#356](https://github.com/wenet-e2e/wespeaker/pull/356).
* 2024.08.20: Update diarization recipe for VoxConverse dataset by leveraging umap dimensionality reduction and hdbscan clustering, see [#347](https://github.com/wenet-e2e/wespeaker/pull/347) and [#352](https://github.com/wenet-e2e/wespeaker/pull/352).
* 2024.08.18: Support using ssl pre-trained models as the frontend. The [WavLM recipe](https://github.com/wenet-e2e/wespeaker/blob/master/examples/voxceleb/v2/run_wavlm.sh) is also provided, see [#344](https://github.com/wenet-e2e/wespeaker/pull/344).
* 2024.05.15: Add support for [quality-aware score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320).
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,5 @@
| || 6.63M | 1.88 |
| Whisper-PMFA | × | 478.7M | 1.62 |
| || 478.7M | **1.42** |
| Whisper-PMFA with LoRa (Coming soon) || 10.9M | 1.62 |
| Whisper-PMFA with LoRA (Coming soon) || 10.9M | 1.62 |

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### train configuraton

exp_dir: exp/test
exp_dir: exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s
gpus: "[0,1]"
num_avg: 10
num_avg: 1
enable_amp: False # whether enable automatic mixed precision training

seed: 42
Expand Down Expand Up @@ -57,7 +57,7 @@ margin_update:
initial_margin: 0.2
final_margin: 0.2
increase_start_epoch: 0
fix_start_epoch: 30
fix_start_epoch: 4
update_margin: True
increase_type: "exp" # exp, linear

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
### train configuraton

exp_dir: exp/test
exp_dir: exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s
gpus: "[0,1]"
num_avg: 10
num_avg: 1
enable_amp: False # whether enable automatic mixed precision training

seed: 42
Expand Down Expand Up @@ -56,7 +56,7 @@ margin_update:
initial_margin: 0.2
final_margin: 0.2
increase_start_epoch: 0
fix_start_epoch: 30
fix_start_epoch: 8
update_margin: True
increase_type: "exp" # exp, linear

Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/local/score.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
scores_dir=${exp_dir}/scores
for x in $trials; do
python wespeaker/bin/compute_metrics.py \
--p_target 0.01 \
--p_target 0.05 \
--c_fa 1 \
--c_miss 1 \
${scores_dir}/${x}.score \
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/local/score_norm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ if [ $stage -le 3 ] && [ $stop_stage -ge 3 ]; then
for x in ${trials}; do
scores_dir=${exp_dir}/scores
python wespeaker/bin/compute_metrics.py \
--p_target 0.01 \
--p_target 0.05 \
--c_fa 1 \
--c_miss 1 \
${scores_dir}/${output_name}_${x}.score \
Expand Down
17 changes: 9 additions & 8 deletions examples/voxceleb/v1/Whisper-PMFA/run.sh
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
#!/bin/bash

# Copyright 2022 Hongji Wang ([email protected])
# 2022 Chengdong Liang ([email protected])
# 2022 Zhengyang Chen ([email protected])
# Copyright 2024 Yiyang Zhao ([email protected])
# 2024 Hongji Wang ([email protected])

. ./path.sh || exit 1

stage=3
stop_stage=3
stage=-1
stop_stage=-1

data=data
data_type="raw" # shard/raw
model=whisper_PMFA_large_v2

exp_dir=exp/Whisper_PMFA_large_v2_voxceleb1_mel_5s

gpus="[0]"
num_avg=10
gpus="[0,1]"
num_avg=1
checkpoint=

trials="vox1_O_cleaned.kaldi"
Expand All @@ -25,6 +23,9 @@ score_norm_method="asnorm" # asnorm/snorm
top_n=300

. tools/parse_options.sh || exit 1
if ! pip show openai-whisper > /dev/null 2>&1; then
pip install openai-whisper==20231117
fi

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
echo "Preparing datasets ..."
Expand Down
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/tools
2 changes: 1 addition & 1 deletion examples/voxceleb/v1/Whisper-PMFA/wespeaker
2 changes: 0 additions & 2 deletions wespeaker/bin/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ def train(config='conf/config.yaml', **kwargs):

# model: frontend (optional) => speaker model => projection layer
logger.info("<== Model ==>")
# frontend: fbank or s3prl
frontend_type = configs['dataset_args'].get('frontend', 'fbank')
if frontend_type != "fbank":
frontend_args = frontend_type + "_args"
Expand All @@ -119,7 +118,6 @@ def train(config='conf/config.yaml', **kwargs):
model.add_module("frontend", frontend)
else:
model = get_speaker_model(configs['model'])(**configs['model_args'])

if rank == 0:
num_params = sum(param.numel() for param in model.parameters())
logger.info('speaker_model size: {}'.format(num_params))
Expand Down

0 comments on commit 53e5ad3

Please sign in to comment.