Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
support for longformer (new 2023 request) (#47)
* Add files via upload * Update __init__.py Added LongformerWithTabular * Update tabular_modeling_auto.py Added support for Longformer * Update tabular_transformers.py Added support for long former * Update setup.py Changed url to point to this package. * Update tabular_transformers.py Changed add_start_docstring * Update tabular_transformers.py Remove @ * Update tabular_transformers.py Changed @add_start_docstrings_to_model_forward * Update tabular_transformers.py Removed @ from transformers.file_utils import @add_start_docstrings_to_model_forward * Update tabular_transformers.py Loaded this last - from transformers.file_utils import add_start_docstrings_to_model_forward * Update tabular_transformers.py Added @ to from transformers.file_utils import @add_start_docstrings_to_model_forward * Update tabular_transformers.py Removed @ from from transformers.file_utils import @add_start_docstrings_to_model_forward * Update tabular_transformers.py Uncommented XLMRobertaConfig and moved add_start_docstrings_to_model_forward to bottom of group * Update tabular_transformers.py Commented out self.embedding_layer = nn.Embedding.from_pretrained(torch.from_numpy(embedding_weights).float(), freeze=True) * Update tabular_transformers.py Uncommented embeddings section * Update tabular_transformers.py Removed text after .format in @add_start_docstrings_to_model_forward(LONGFORMER_INPUTS_DOCSTRING.format("(batch_size, sequence_length)") * Update tabular_transformers.py * Update tabular_transformers.py Copied over the longformer section from sidharrth2002 and changed to @add_start_docstrings_to_model_forward * Update tabular_transformers.py Commented out #self.embedding_layer = nn.Embedding.from_pretrained(torch.from_numpy(embedding_weights).float(), freeze=True) #self.embedding_layer = nn.Embedding() * Update tabular_transformers.py Add import torch * Update tabular_modeling_auto.py Move longformer to top of lists * Update tabular_transformers.py hf_model_config.summary_proj_to_labels=False #Added from XLM example * Update tabular_transformers.py Updated longformer class to match XLM * Update tabular_transformers.py Removed XLM edits * Update tabular_transformers.py Commented out #self.dropout = nn.Dropout(hf_model_config.hidden_dropout_prob) * Update layer_utils.py Changed loss_fct based on this suggestion - https://discuss.huggingface.co/t/data-format-for-bertforsequenceclassification-with-num-labels-2/4156 * Update layer_utils.py Changed back to original value. * Update tabular_combiner.py #changed to matmul based on https://stackoverflow.com/questions/67957655/runtimeerror-self-must-be-a-matrix * Update tabular_combiner.py Trying torch.mul instead of torch.matmul * Update tabular_combiner.py Revert back to torch.mm * Update tabular_combiner.py Changed line 447 back to torch.mul * Update tabular_combiner.py Change back to torch.mm on line 449 and print shapes. * Update tabular_combiner.py Remove shape statements * Update tabular_combiner.py Changed to -1 in combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=-1) * Update tabular_combiner.py Changed from torch.cat to torch.stack in combined_feats = torch.stack((text_feats, cat_feats, numerical_feats), dim=1) * Update tabular_combiner.py Changed back to original code for combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=1) * Update tabular_transformers.py Changed forward to input_ids=torch.LongTensor(batch_size, sequence_length) * Update tabular_transformers.py Updated forward to input_ids(torch.LongTensor(batch_size, sequence_length)) * Update tabular_transformers.py Added comma to input_ids(torch.LongTensor(batch_size, sequence_length)), * Update tabular_transformers.py Changed back to input_ids=None, * Add files via upload * Update tabular_transformers.py Load embeddings * Update tabular_combiner.py combined_feats = torch.stack((text_feats, cat_feats, numerical_feats), dim=1) * Update tabular_transformers.py Removed load embeddings * Update tabular_combiner.py combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=1) * Update tabular_combiner.py Testing with only text - combined_feats = torch.cat((text_feats), dim=1) * Update tabular_combiner.py Changed back to combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=1) * Update tabular_combiner.py combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=0) * Update tabular_combiner.py combined_feats = torch.stack((text_feats, cat_feats, numerical_feats), dim=0) * Update tabular_transformers.py Updated class Longformer to match Roberta * Update tabular_combiner.py Changed to combined_feats = torch.cat((text_feats, cat_feats, numerical_feats), dim=1) * Add files via upload * Delete Longformer_text_w_tabular_classification_042623.ipynb * Delete text_w_tabular_classification.ipynb * Add files via upload * Delete Longformer_text_w_tabular_classification_050423.ipynb * Update layer_utils.py Removed commented line and 3 blank lines * Update tabular_combiner.py Removed commented lines from testing * Add files via upload Added the original notebook to this folder. * Rename Longformer_text_w_tabular_classification_051823.ipynb to longformer_text_w_tabular_classification.ipynb Changed file name * Update setup.py Removed changes in __version__
- Loading branch information