Deterministic investigation of the sensitivity of learning with limited labelled data to the effects of randomness

The repository contains the experiments and code for the following two papers:

"On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices" accepted at the EMNLP'24 main (preprint).
"Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance" as preprint on arXiv (preprint).

Dependencies and local setup

The code in this repository uses Python. The required dependencies are specified in the requirements.txt.

Simply run pip install -r requiremets.txt.

Running the investigation

The investigation method considers multiple randomness factors (data split, label selection, model initialisation, data order, sample choice, model randomness, and a general golden model), multiple dataset from the GLUE and SuperGLUE benchmarks downloaded from HuggingFace and other multi-class models (SST2, CoLA, MRPC, BoolQ, AG News, TREC, SNIPS, DB Pedia) and multiple approaches and models for learning with limited labelled data (Fine-tuning with BERT and RoBERTa; Prompting and In-Context learning with Flan-T5, LLaMA-2, Mistral-7B, Zephyr-7B and ChatGPT; Instruction-Tuning with Flan-T5, Mistral-7B and Zephyr-7B; and Meta-Learning with MAML, FoMAML Reptile and Prototypical Networks). The investigation can be run with different set of parameters (check the main.py file for a set of accepted main and supplementary arguments for the investigation).

We provide two separate sets of experiments that can be run using this repository:

Please refer to the detailed Readme files specific for the experiment of your interest (linked in the list above).

Paper Citing

@inproceedings{pecher-etal-2024-sensitivity,
    title = "On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices",
    author = "Pecher, Branislav  and
      Srba, Ivan  and
      Bielikova, Maria",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    year = "2024",
    publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
meta_learning		meta_learning
optimal_parameters		optimal_parameters
transfer_learning		transfer_learning
visualisations		visualisations
.gitignore		.gitignore
EffectsOfRandomnessFactors.md		EffectsOfRandomnessFactors.md
ImpactOfDatasetSize.md		ImpactOfDatasetSize.md
README.md		README.md
data.py		data.py
format_process_results.py		format_process_results.py
main.py		main.py
process_dataset_size_change_results.py		process_dataset_size_change_results.py
process_dataset_size_change_results_threshold.py		process_dataset_size_change_results_threshold.py
process_results.py		process_results.py
requirements.txt		requirements.txt
shots_process_results.py		shots_process_results.py
visualise_dataset_size_change_results.py		visualise_dataset_size_change_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deterministic investigation of the sensitivity of learning with limited labelled data to the effects of randomness

Dependencies and local setup

Running the investigation

Paper Citing

About

Releases

Packages

Languages

kinit-sk/L3D-sensitivity-investigation

Folders and files

Latest commit

History

Repository files navigation

Deterministic investigation of the sensitivity of learning with limited labelled data to the effects of randomness

Dependencies and local setup

Running the investigation

Paper Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages