This repo contains the pre-release version of OwLore algorithm, proposed by OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning.
Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore) is a novel memory-efficient LLM fine-tuning approach, enhances fine-tuning performance by using layerwise sampling and gradient low-rank training.
The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs, which dynamically samples pre-trained layers to fine-tune instead of adding additional adaptors. We first interpret the outlier phenomenon through the lens of Heavy-Tailed Self-Regularization theory (HT-SR), discovering that layers with more outliers tend to be more heavy-tailed and consequently better trained. Inspired by this finding, OwLore strategically assigns higher sampling probabilities to layers with more outliers to better leverage the knowledge stored in pre-trained LLMs. To further mitigate the memory demands of fine-tuning, we integrate gradient low-rank projection into our approach, which facilitates each layer to be efficiently trained in a low-rank manner. By incorporating the efficient characteristics of low-rank and optimal layerwise sampling, OwLore significantly improves the memory-performance trade-off in LLM pruning. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory.
Our repository is built on top of LMFlow. You can configure the environment using the following command lines:
conda create -n owlore python=3.9 -y
conda activate owlore
conda install mpi4py
bash install.sh
pip install peft
You can download our processed datasets from Hugging Face here.
We provide a quick overview of the arguments:
--model_name_or_path
: The identifier for the model on the Hugging Face model hub.--lisa_activated_layers
: Specifies the number of layers to activate at each step during training.--lisa_interval_steps
: Indicates the number of steps after which resampling occurs.--lisa_prob_mode
: Defines the method used to determine the sampling probability, which can include options such asuniform
,owl
,decrease
,increase
, etc.--galore
: Indicates whether to use GaLore as the optimizer.
The script will run LISA on the Commonsense Reasoning
dataset.
bash owlore_scripts/run_lisa.sh merge # LISA
The script will run OwLore on the Commonsense Reasoning
dataset.
bash owlore_scripts/run_owlore_low_rank.sh merge # OwLore
The script will run LISA on the MMLU
dataset.
bash owlore_scripts/run_lisa.sh mmlu # LISA
The script will run OwLore on the MMLU
dataset.
bash owlore_scripts/run_owlore_low_rank.sh mmlu # OwLore
The script will run LISA on the GSM8k
dataset.
bash owlore_scripts/run_lisa.sh gsm # LISA
The script will run OwLore on the GSM8k
dataset.
bash owlore_scripts/run_owlore_low_rank.sh gsm # OwLore
We use Language Model Evaluation Harness to obtain evaluation results. Please refer to its installation instructions to configure lm_eval
. The steps are as follows:
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
After setting up the environment, use the following command to run the evaluation:
accelerate launch -m lm_eval \
--model hf \
--model_args pretrained=meta-llama/Meta-Llama-3-8B\
--tasks mmlu \
--output_path mmlu_results \
--num_fewshot 5 \
--batch_size auto \
--cache_requests true
accelerate launch -m lm_eval \
--model hf \
--model_args pretrained=meta-llama/Meta-Llama-3-8B\
--tasks boolq,piqa,social_iqa,hellaswag,winogrande,arc_easy,arc_challenge,openbookqa \
--output_path qa_results \
--num_fewshot 5 \
--batch_size auto \
--cache_requests true
accelerate launch -m lm_eval \
--model hf \
--model_args pretrained=meta-llama/Meta-Llama-3-8B\
--tasks gsm8k \
--output_path math_results \
--batch_size auto \
--cache_requests true
This repository is build upon the LMFlow and OWL repositories. Thanks for their great work!
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@misc{li2024owlore,
title={OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning},
author={Pengxiang Li and Lu Yin and Xiaowei Gao and Shiwei Liu},
year={2024},
eprint={2405.18380},
archivePrefix={arXiv},
primaryClass={cs.LG}
}