wip: support for joint diarization and embedding #1409

clement-pages · 2023-06-17T13:39:18Z

No description provided.

…e.to Fixes 1397

BREAKING(model): get rid of (flaky) `Model.introspection`

…o feat/joint-diarization-and-embedding

- fixes the dimension error between files id and probabilties arrays - changes the way of how chunks for the embedding task are sampled - creates two functions to draw chunks, one for each subtask Tests are required to ensure that there are no bugs

For now this is a copy past from methods in segmentation task.

as computing this loss probably does not make sense in powerset mode because first class (empty set of labels) does exactly this

hbredin · 2023-06-20T12:47:50Z

pyannote/audio/tasks/joint_task/speaker_diarization_and_embedding.py

+            # keep track of list of classes for regular segmentation protocols
+            # Different files may be annotated using a different set of classes
+            # (e.g. one database for speech/music/noise, and another one for male/female/child)
+            if isinstance(self.protocol, SegmentationProtocol):
+
+                if "classes" in file:
+                    local_classes = file["classes"]
+                else:
+                    local_classes = file["annotation"].labels()
+
+                # if task was not initialized with a fixed list of classes,
+                # we build it as the union of all classes found in files
+                if self.classes is None:
+                    for klass in local_classes:
+                        if klass not in classes:
+                            classes.append(klass)
+                    annotated_classes.append(
+                        [classes.index(klass) for klass in local_classes]
+                    )
+
+                # if task was initialized with a fixed list of classes,
+                # we make sure that all files use a subset of these classes
+                # if they don't, we issue a warning and ignore the extra classes
+                else:
+                    extra_classes = set(local_classes) - set(self.classes)
+                    if extra_classes:
+                        warnings.warn(
+                            f"Ignoring extra classes ({', '.join(extra_classes)}) found for file {file['uri']} ({file['database']}). "
+                        )
+                    annotated_classes.append(
+                        [
+                            self.classes.index(klass)
+                            for klass in set(local_classes) & set(self.classes)
+                        ]
+                    )


Can probably be removed. Double check, though.

Indeed, this can be removed. Same for lines 419-423 I think. I can also remove self.classes and self.annotated_classes. These two attributes are not used anywhere in the code aside in setup for SegmentationProtocol

Done in 3d295dd

hbredin · 2023-06-20T12:48:20Z

pyannote/audio/tasks/joint_task/speaker_diarization_and_embedding.py

+        elif isinstance(self.protocol, SegmentationProtocol):
+            classes = getattr(self, "classes", list())


Can probably be removed. Double check, though.

Second thoughts. It would be better for this class to inherit from SegmentationMixin because there is a lot of duplicated code in setup

Can probably be removed. Double check, though.

Done in 3d295dd

hbredin · 2023-06-20T13:14:19Z

pyannote/audio/tasks/joint_task/speaker_diarization_and_embedding.py

+        metadata = self.metadata[file_id]
+        sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
+        sample["meta"]["file"] = file_id
+        sample["meta"]["subtask"] = subtask


At this point, sample["meta"] already contains a "scope" key.
Therefore, I think you can safely remove this additional "subtask" key -- which is basically equivalent to checking whether "scope" is global nor not.

Bonus: you could then inherit prepare_chunk from the regular SpeakerDiarization task.

hbredin · 2023-06-20T13:46:09Z

pyannote/audio/tasks/joint_task/speaker_diarization_and_embedding.py

+        segment = np.random.choice(class_segments, p=prob_segments)
+
+        # sample chunk start time in order to intersect it with the sampled segment
+        start_time = np.random.uniform(segment["start"] - duration / 2, segment["start"])


I would actually sample between

segment["start"] - duration, because the choice of 50% of duration is a bit arbitrary and might lead to a bias in the distribution of speech within a chunk.

and

segment["end"], because using segment["start"] would definitely lead to a similar bias as well.

That being said, I do understand that there are a lot of short audio files in VoxCeleb and choosing segment["end"] as endpoint could lead to a lot of chunks that must be zero padded. Let's discuss this in our next meeting.

as this instance attribute was not used

…` pipeline Co-authored-by: Hervé BREDIN <[email protected]>

as these loop could break gradient flow and to optimize the code

for now do the trick only for the diarization subtask

There was an issue when the number of speakers in a chunk was greater than the maximum number per chunk set for the task.

these two methods were identical to the methods inherited from the `SegmentationTaskMixin` class

and fix issue with the loss type during training

this version replace `StatsPool` by a concatenation of the last outputs of TDNN (for the embedding part) and LSTM (for the diarization part) and a LSTM layer

Now, this LSTM is bidirectionnal and has a hidden size of 1500, so the outputs shape of this encoder is (b, s, 1500*2). This will allow comparing with `StatsPool` version of the SPEED model

hbredin · 2024-01-23T07:54:03Z

Closing as (I think, correct me if I am wrong) this is superseded by #1583.

chai3 and others added 15 commits June 8, 2023 08:42

fix: raise TypeError on wrong device type in Pipeline.to and Inferenc…

0551070

…e.to Fixes 1397

feat(task): add support for multi-task models (pyannote#1374)

30ddb0b

BREAKING(model): get rid of (flaky) `Model.introspection`

fix(inference): fix multi-task inference

4eb7190

feat: update FAQtory default answer

dcdfc15

add draft version of the joint diarization and embedding tasks

87f49f9

Merge branch 'develop' of github.com:clement-pages/pyannote-audio int…

6025a80

…o feat/joint-diarization-and-embedding

fix StopIteration error

04de82f

add missing collate methods

d8cb598

For now this is a copy past from methods in segmentation task.

remove support for non-powerset mode

d2d6e14

remove computing of vad loss

e58943b

as computing this loss probably does not make sense in powerset mode because first class (empty set of labels) does exactly this

remove unused imports

bc989cd

fix probabilities do not sum to 1 error

b4d0a78

attempt to fix file duration error

78718b1

attempt to fix negative start_time in embedding part

dfdd8f3

hbredin reviewed Jun 20, 2023

View reviewed changes

clement-pages and others added 11 commits June 20, 2023 16:39

add end-to-end diarization and embedding model

1888360

update end-to-end model

6216d1f

clean multi-task source code

b42cc33

remove support for SegmentationProtocol in the multi-tasks

3d295dd

improve(test): use pyannote.database.registry (pyannote#1413)

3363be6

Set alpha coefficient as attribute

99a7762

remove diarization_database_files attribute

f2a4e34

as this instance attribute was not used

feat(pipeline): add return_embeddings option to `SpeakerDiarization…

017c910

…` pipeline Co-authored-by: Hervé BREDIN <[email protected]>

fix: fix missed speech at the very beginning/end

cf0e3b3

add losses computation in training_step method

f48b74f

doc: add note to self regarding cluster reassignment (pyannote#1419)

f393546

clement-pages and others added 18 commits June 28, 2023 16:26

remove for loops in embedding loss computation

5718593

as these loop could break gradient flow and to optimize the code

add validation part into the multi-task

8036572

remove subtask parameter from prepare_chunk

aa36d7b

fix bugs in validation part

6617c9c

for now do the trick only for the diarization subtask

simplify the way embedding loss is calculated

60d5543

handle case where there is no files from diarization dataset

2834d3e

fix(doc): fix typo in diarization docstring

35be745

fix size issue in collate_y when building embedding ref

5628b48

There was an issue when the number of speakers in a chunk was greater than the maximum number per chunk set for the task.

fix condition to compute emb_loss in training_step

c4988f4

Merge branch 'develop' into feat/joint-diarization-and-embedding

75467f0

add missing docstrings

78b5b04

remove redefinitions of collate_X and collate_meta

bdf3567

these two methods were identical to the methods inherited from the `SegmentationTaskMixin` class

add missing dia_loss assignment

aae90a0

and fix issue with the loss type during training

filter out the speaker in ref not found by diarization

d3b3efc

modifiy start_time possible values interval in draw_embedding_chunk

4289ea9

add V2 of SpeakerEndToEndDiarization

e9f40a3

this version replace `StatsPool` by a concatenation of the last outputs of TDNN (for the embedding part) and LSTM (for the diarization part) and a LSTM layer

Add padding="same" in model Conv1d layers

3f7cb8a

update LSTM encoder in SPEED V2

0f1577d

Now, this LSTM is bidirectionnal and has a hidden size of 1500, so the outputs shape of this encoder is (b, s, 1500*2). This will allow comparing with `StatsPool` version of the SPEED model

hbredin force-pushed the develop branch from e487e0e to b9548a7 Compare September 20, 2023 15:43

hbredin closed this Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: support for joint diarization and embedding #1409

wip: support for joint diarization and embedding #1409

clement-pages commented Jun 17, 2023

hbredin Jun 20, 2023

clement-pages Jun 20, 2023

clement-pages Jun 21, 2023

hbredin Jun 20, 2023

hbredin Jun 20, 2023

clement-pages Jun 21, 2023

hbredin Jun 20, 2023

hbredin Jun 20, 2023 •

edited

Loading

hbredin Jun 20, 2023

hbredin commented Jan 23, 2024

		elif isinstance(self.protocol, SegmentationProtocol):
		classes = getattr(self, "classes", list())

wip: support for joint diarization and embedding #1409

wip: support for joint diarization and embedding #1409

Conversation

clement-pages commented Jun 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hbredin Jun 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hbredin commented Jan 23, 2024

hbredin Jun 20, 2023 •

edited

Loading