-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: support for joint diarization and embedding #1409
wip: support for joint diarization and embedding #1409
Conversation
BREAKING(model): get rid of (flaky) `Model.introspection`
…o feat/joint-diarization-and-embedding
- fixes the dimension error between files id and probabilties arrays - changes the way of how chunks for the embedding task are sampled - creates two functions to draw chunks, one for each subtask Tests are required to ensure that there are no bugs
For now this is a copy past from methods in segmentation task.
as computing this loss probably does not make sense in powerset mode because first class (empty set of labels) does exactly this
# keep track of list of classes for regular segmentation protocols | ||
# Different files may be annotated using a different set of classes | ||
# (e.g. one database for speech/music/noise, and another one for male/female/child) | ||
if isinstance(self.protocol, SegmentationProtocol): | ||
|
||
if "classes" in file: | ||
local_classes = file["classes"] | ||
else: | ||
local_classes = file["annotation"].labels() | ||
|
||
# if task was not initialized with a fixed list of classes, | ||
# we build it as the union of all classes found in files | ||
if self.classes is None: | ||
for klass in local_classes: | ||
if klass not in classes: | ||
classes.append(klass) | ||
annotated_classes.append( | ||
[classes.index(klass) for klass in local_classes] | ||
) | ||
|
||
# if task was initialized with a fixed list of classes, | ||
# we make sure that all files use a subset of these classes | ||
# if they don't, we issue a warning and ignore the extra classes | ||
else: | ||
extra_classes = set(local_classes) - set(self.classes) | ||
if extra_classes: | ||
warnings.warn( | ||
f"Ignoring extra classes ({', '.join(extra_classes)}) found for file {file['uri']} ({file['database']}). " | ||
) | ||
annotated_classes.append( | ||
[ | ||
self.classes.index(klass) | ||
for klass in set(local_classes) & set(self.classes) | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably be removed. Double check, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this can be removed. Same for lines 419-423 I think. I can also remove self.classes
and self.annotated_classes
. These two attributes are not used anywhere in the code aside in setup
for SegmentationProtocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3d295dd
elif isinstance(self.protocol, SegmentationProtocol): | ||
classes = getattr(self, "classes", list()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably be removed. Double check, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thoughts. It would be better for this class to inherit from SegmentationMixin
because there is a lot of duplicated code in setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably be removed. Double check, though.
Done in 3d295dd
metadata = self.metadata[file_id] | ||
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names} | ||
sample["meta"]["file"] = file_id | ||
sample["meta"]["subtask"] = subtask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, sample["meta"]
already contains a "scope"
key.
Therefore, I think you can safely remove this additional "subtask"
key -- which is basically equivalent to checking whether "scope"
is global nor not.
Bonus: you could then inherit prepare_chunk
from the regular SpeakerDiarization
task.
segment = np.random.choice(class_segments, p=prob_segments) | ||
|
||
# sample chunk start time in order to intersect it with the sampled segment | ||
start_time = np.random.uniform(segment["start"] - duration / 2, segment["start"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually sample between
segment["start"] - duration
, because the choice of 50% ofduration
is a bit arbitrary and might lead to a bias in the distribution of speech within a chunk.
and
segment["end"]
, because usingsegment["start"]
would definitely lead to a similar bias as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, I do understand that there are a lot of short audio files in VoxCeleb
and choosing segment["end"]
as endpoint could lead to a lot of chunks that must be zero padded. Let's discuss this in our next meeting.
as this instance attribute was not used
…` pipeline Co-authored-by: Hervé BREDIN <[email protected]>
as these loop could break gradient flow and to optimize the code
for now do the trick only for the diarization subtask
There was an issue when the number of speakers in a chunk was greater than the maximum number per chunk set for the task.
these two methods were identical to the methods inherited from the `SegmentationTaskMixin` class
and fix issue with the loss type during training
this version replace `StatsPool` by a concatenation of the last outputs of TDNN (for the embedding part) and LSTM (for the diarization part) and a LSTM layer
Now, this LSTM is bidirectionnal and has a hidden size of 1500, so the outputs shape of this encoder is (b, s, 1500*2). This will allow comparing with `StatsPool` version of the SPEED model
Closing as (I think, correct me if I am wrong) this is superseded by #1583. |
No description provided.