You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently training a Sentence Transformer on my dataset using triplet loss, but I am encountering an issue where the gradient norm (grad_norm) is consistently 0.0 during training. This problem persists when using the recommended group_by_label batch sampler for triplet loss.
Details
Current Setup:
Model: Alibaba-NLP/gte-base-en-v1.5
Loss Function: Triplet Loss
Batch Sampler: group_by_label (recommended for triplet loss)
Observations
When I switch the batch sampler to either batch_sampler or no_duplicate, I notice an improvement in the training logs, and the grad_norm values become non-zero.
However, I want to utilize the group_by_label sampler as it is suggested for triplet loss, and I need assistance in understanding why this specific configuration is causing issues.
What could be causing the grad_norm to be 0.0 when using the group_by_label sampler?
Are there any adjustments or configurations you would recommend to resolve this issue while still using the recommended batch sampler?
Thank you!
The text was updated successfully, but these errors were encountered:
AmoghM
changed the title
grad_norm 0.0 while finetuning sentence transformer
grad_norm 0.0 while finetuning using group_by_label batch sampler
Dec 10, 2024
Hello!
Apologies for the delay, I've been busy on a release.
Are you using the TripletLoss or the Batch...TripletLoss? Apologies for the confusion here, but there are fairly sizable differences:
TripletLoss: Given (anchor, positive, negative) triplets, train such that anchor and positive get at least margin closer than anchor and negative.
Batch...TripletLoss: Given text with class labels, the loss automatically finds pairs that should be more similar: texts from the same class, as well as pairs that should be less similar: texts from other classes. As you can expect, to get pairs that should be more similar, we need at least 2 texts from the same class in each batch. That's what the group_by_label batch sampler ensures.
In short, the latter benefits from group_by_label (at least in theory), whereas the former does not.
Could you let me know which of the two you are using?
@tomaarsen No worries. Thanks for responding. The loss is BatchHardSoftMarginTripletLoss which is mentioned in the code snippet above. I'm pasting it here to avoid any further confusion.
I am currently training a Sentence Transformer on my dataset using triplet loss, but I am encountering an issue where the gradient norm (
grad_norm
) is consistently 0.0 during training. This problem persists when using the recommendedgroup_by_label
batch sampler for triplet loss.Details
group_by_label
(recommended for triplet loss)Observations
batch_sampler
orno_duplicate
, I notice an improvement in the training logs, and thegrad_norm
values become non-zero.group_by_label
sampler as it is suggested for triplet loss, and I need assistance in understanding why this specific configuration is causing issues.Below is the sample code:
Tensorboard viz of training with different batch sampler. Orange line is
no_duplicate
. Blue line isgroup_label
. Red line isbatch_sampler
Training logs for no_duplicate batch sampler:
vs training logs for
group_by_label
batch sampler:Questions
grad_norm
to be 0.0 when using thegroup_by_label
sampler?Thank you!
The text was updated successfully, but these errors were encountered: