You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've just found that classifier_dropout is fixed to 0.1 in both LlamaForTokenClassification and MistralForTokenClassification as:
self.dropout = nn.Dropout(0.1)
however in GPT2ForTokenClassification of HuggingFace Transformers it is enhanced as:
if hasattr(config, "classifier_dropout") and config.classifier_dropout is not None:
classifier_dropout = config.classifier_dropout
elif hasattr(config, "hidden_dropout") and config.hidden_dropout is not None:
classifier_dropout = config.hidden_dropout
else:
classifier_dropout = 0.1
self.dropout = nn.Dropout(classifier_dropout)
and now we are trying to include LlamaForTokenClassification and MistralForTokenClassification into HuggingFace Transformers at huggingface/transformers#29878 . Please show us better way to include them.
The text was updated successfully, but these errors were encountered:
@KoichiYasuoka Thank you for your suggestions! I will add a feature to support classifier_dropout.
B.T.W, the BiLLM's implementation for TokenClassification differs from the official one. In BiLLM, we convert the attention mask from uni- to bi-directional. This change can improve the performance of token classification significantly, according to our experiments in the paper https://arxiv.org/abs/2310.01208
I've just found that
classifier_dropout
is fixed to 0.1 in bothLlamaForTokenClassification
andMistralForTokenClassification
as:however in
GPT2ForTokenClassification
of HuggingFace Transformers it is enhanced as:and now we are trying to include
LlamaForTokenClassification
andMistralForTokenClassification
into HuggingFace Transformers at huggingface/transformers#29878 . Please show us better way to include them.The text was updated successfully, but these errors were encountered: