Pull request johan BUT sre #326

gulamungon · 2024-06-04T15:23:04Z

SRE recipe using CTS superset + voxceleb as embedding extractor training data (See README). There are very few changes outside the recipe. Let me know if this is not appropriate:

In tools/make_shard_list.py changed so that if VAD info does not exist for a recording
this recording will be used as is, i.e. no VAD will be applied. This is not ideal since
absence of VAD info currently occurs also if VAD was estimated but no speech was found.
However, there may be situations where we don't want to apply any sets, e.g., we may
want to apply VAD to CTS but not voxceleb. Then we need it this way. This means that
utterances for which VAD was ran but no speech detected, should be filtered before
the shards are created. This will be the case if the sets are filtered with
local/filter_utt_accd_dur.py since this script discards recordings with no Speech
according to VAD.
Ideally this should be improved so that recording for which for which no speech was
detected will be removed while file that we don't want to apply VAD to will be kepts.
Possibly by
- Change the VAD info format so that it also contains files with no speech, then discard them in tools/make_shard_list.py
  while keeping the ones with no VAD info, i.e., those for which we do not want to apply VAD.
- Keep the VAD format as is, i.e., no info for recordings with no speech but instead add a fake VAD info for the files
  we don't want to apply VAD for. This info would simply mark the whole segment as speech.
Changed in get_data_for_plda(... in plda_utils.py to give a warning instead of a crash in
whenever an entry is in the scp but not in utt2spk. This embedding will be skipped.
(This can be the case if files have been added added to the original CTS data folder
since the scp is created by finding all wav files in this directory. In the case of
BUT, we have some extra files here for sanity checks.) If this solution is not appropriate
we can change in the data preparation script so that mismatch between the created scp
and utt2spk is fixed already there.
Changed in local/make_system_sad.py (this change is only in the local recipe) so that it process so that VAD is processed for a limited number of files at the time (hardwired to 10000), after which the VAD result is saved (Instead of processing all at once). When extracting VAD. It took very long time to start otherwise. This could also be helpful in case there is a crash since output is saved after each part instead of after the whole set.

…essing before backend. PLDA multisession scoring.

Merge branch 'master' of github.com:wenet-e2e/wespeaker

JiJiJiang · 2024-06-05T03:11:22Z

Please fix Lint errors.

czy97 · 2024-06-06T12:01:45Z

Hello Johan @gulamungon, thanks for the contribution. Can you first fix the Lint errors. Locally, you can use the flake8 command to check for problematic files and use yapf -i xxx.py to automatically format the problematic files.

gulamungon · 2024-06-06T12:06:16Z

Hello Johan @gulamungon, thanks for the contribution. Can you first fix the Lint errors. Locally, you can use the flake8 command to check for problematic files and use yapf -i xxx.py to automatically format the problematic files.

Sure, I'll try to fix it asap.

gulamungon · 2024-06-12T13:43:00Z

I fixed it hopefully.

gulamungon · 2024-06-15T14:34:20Z

Changed tabs to spaces.

wsstriving · 2024-06-15T23:59:17Z

@gulamungon Hi, Johan, thanks for the contribution, it seems there are still some lint errors (trailing whitespaces)
@czy97 @JiJiJiang Maybe you guys start the reviewing first, and we do the lint fix afterwards

gulamungon · 2024-06-17T07:16:33Z

Hi Shuai, Ok. I see. I can also try to fix it but in the coming two weeks I'm quite busy so most likely it will not manage during this time. Best, Johan

…

On Sun, Jun 16, 2024 at 1:59 AM Shuai Wang ***@***.***> wrote: @gulamungon <https://github.com/gulamungon> Hi, Johan, thanks for the contribution, it seems there are still some lint errors (trailing whitespaces) @czy97 <https://github.com/czy97> @JiJiJiang <https://github.com/JiJiJiang> Maybe you guys start the reviewing first, and we do the lint fix afterwards — Reply to this email directly, view it on GitHub <#326 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXXG3A2DZY6V5LQLPAASBTZHTIOXAVCNFSM6AAAAABIYYC7OKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZQHE3TQNZTHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

JiJiJiang · 2024-06-25T05:14:54Z

examples/sre/v3/path.sh

+export PYTHONIOENCODING=UTF-8
+export PYTHONPATH=../../../:$PYTHONPATH
+
+export PATH=$PATH:/mnt/matylda6/rohdin/software/kaldi_20200214/tools/sph2pipe_v2.5/


Hi, Johan @gulamungon , thanks for your codes.

Since installing Kaldi is a little complex, we only adopt some useful shell/perl/python scripts in WeSpeaker rather than installing the whole Kaldi.
You may consider two methods here:

Download sph2pipe_v2.5.tar.gz and decompress it into a external_tools dir;

Use some other tools to convert sph into wav, i.e., ffmpeg.

Sure, I'll fix it. I'm on a trip this week so probably next week.

@gulamungon Hi Johan, can you fix the lint errors, .etc, then we can proceed the merging process

…m Kaldi SRE16 recipe.

gulamungon · 2024-08-21T23:43:45Z

I changed sph2pipe to ffmpeg. Since before, we simply used sre16 data prepared elsewhere by Kaldi (as in ../v2) and this uses sph2pipe, I here instead copied in the sre16 datapreparation scripts and modified them to use ffmpeg instead as well as some other minor changes to fit better here.

I fixed the trailing spaces.

Since now quite many things have been changed I'm rerunning the recipe to see that nothing is broken. I think you can review it but perhaps better to wait with the merge until the run has finished.

JiJiJiang · 2024-08-22T14:38:38Z

@gulamungon Hi Johan, thank you for your update! But there still seems some Lint errors.
You can run pip install pre-commit and pre-commit install, the lint errors would occur the time you run git commit before you push the latest code to github repo.

gulamungon · 2024-08-26T14:02:27Z

Ok trying again. Actually, I did those checks manually but I was doing it from examples/sre/v3 and it seems errors in files that are links were not detected properly. pre-commit is convenient. Thanks for the tip.

gulamungon · 2024-08-26T14:11:03Z

It is not clear to me what the flake8 issue is. I didn't see it when I ran it locally.

JiJiJiang · 2024-08-27T08:30:13Z

It is not clear to me what the flake8 issue is. I didn't see it when I ran it locally.

This flake8 error was fixed in the recent updates. Maybe you need to merge the master branch first and fix the conflicts (if exists).

… merging in master branch.

gulamungon · 2024-08-29T10:50:36Z

Ok. I pulled the recent changes to master than merged it into this branch. Hopefully it works now.

gulamungon and others added 27 commits February 7, 2024 12:31

first commit

23beeeb

Initial CTS recipe

20bb369

v3 recipe which is based on CTS superset. More generic embedding proc…

8ad7f3b

…essing before backend. PLDA multisession scoring.

Merging with public repo master branch after developing examples/sre/v3.

0e16667

Merge branch 'master' of github.com:wenet-e2e/wespeaker

Minor fixes of sre/v3 recipe

b3ea0a1

Some path corrections.

7165698

Minor corrections of sre/v3/recipe

c08593b

Minor corrections in sre/v3 recipe.

bc14b8c

Adding some missing scripts.

5bf5f01

Added some flexibility for VAD usage in data preparation.

97b3d27

Bugfixes

602b5fd

minor fixes in sre datapreparation

7c1ea7c

Minor bugfix

f4415a5

Minor bugfix

87e8416

Adding cosine scoring with LDA etc. preprocessing.

62d915a

Updated README

6ab8a2c

Updated README.

28c0fc2

Updated README.

1429c83

Updated README.

8326cd7

Updated README.

971dae3

Updated README.

3eab6da

Updated README.

b1ffc01

Updated README.

1e25e08

Updated README.

6713a4f

Updated README.

6671fd9

Updated README.

1a71358

Fix merge conflict of public repo and local updates.

7eb34b6

wsstriving requested review from czy97 and Hunterhuan June 4, 2024 15:35

Fixed flake errors

aa27355

Changed tabs to spaces.

22de394

JiJiJiang reviewed Jun 25, 2024

View reviewed changes

gulamungon added 3 commits August 20, 2024 15:48

Fix trailing spaces.

f991877

Remove dependence on sph2pipe including adding in modifed scripts fro…

04654d8

…m Kaldi SRE16 recipe.

Fix spaces.

765f6be

gulamungon added 2 commits August 26, 2024 15:04

Updated README

ec6d4f7

Some yapf fixes

b5da60d

gulamungon added 2 commits August 29, 2024 12:38

Merge branch 'master' into pull_request_Johan_BUT_sre

cbd9310

Updating README. Mostly a dummy commit to do pre commit testing after…

26cd435

… merging in master branch.

JiJiJiang approved these changes Aug 29, 2024

View reviewed changes

JiJiJiang merged commit 03ceb00 into wenet-e2e:master Aug 29, 2024
4 checks passed

gulamungon deleted the pull_request_Johan_BUT_sre branch September 17, 2024 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request johan BUT sre #326

Pull request johan BUT sre #326

gulamungon commented Jun 4, 2024

JiJiJiang commented Jun 5, 2024

czy97 commented Jun 6, 2024

gulamungon commented Jun 6, 2024

gulamungon commented Jun 12, 2024

gulamungon commented Jun 15, 2024

wsstriving commented Jun 15, 2024

gulamungon commented Jun 17, 2024 via email

JiJiJiang Jun 25, 2024 •

edited

Loading

gulamungon Jun 25, 2024

wsstriving Aug 1, 2024

gulamungon commented Aug 21, 2024

JiJiJiang commented Aug 22, 2024 •

edited

Loading

gulamungon commented Aug 26, 2024

gulamungon commented Aug 26, 2024

JiJiJiang commented Aug 27, 2024

gulamungon commented Aug 29, 2024

Pull request johan BUT sre #326

Pull request johan BUT sre #326

Conversation

gulamungon commented Jun 4, 2024

JiJiJiang commented Jun 5, 2024

czy97 commented Jun 6, 2024

gulamungon commented Jun 6, 2024

gulamungon commented Jun 12, 2024

gulamungon commented Jun 15, 2024

wsstriving commented Jun 15, 2024

gulamungon commented Jun 17, 2024 via email

JiJiJiang Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

gulamungon Jun 25, 2024

Choose a reason for hiding this comment

wsstriving Aug 1, 2024

Choose a reason for hiding this comment

gulamungon commented Aug 21, 2024

JiJiJiang commented Aug 22, 2024 • edited Loading

gulamungon commented Aug 26, 2024

gulamungon commented Aug 26, 2024

JiJiJiang commented Aug 27, 2024

gulamungon commented Aug 29, 2024

JiJiJiang Jun 25, 2024 •

edited

Loading

JiJiJiang commented Aug 22, 2024 •

edited

Loading