A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. Arxiv, 2024.'.
Important
Contributions welcome:
- If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us! Or, you may also consider submitting 'Pull requests' directly, thank you!
- If you think your paper is more suitable for another category, please contact us or submit 'Pull requests'. If your paper is accepted, you may consider updating the relevant information. Thank you!
- š„š„š„ We marked the papers that used model size
$\geq$ 7B in experiments.
Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.
If you find our paper or this resource helpful, please consider cite:
@article{Survery_ModelMerging_2024,
title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
journal={arXiv preprint arXiv:2408.07666},
year={2024}
}
Thanks!
- Awesome-Model-Merging-Methods-Theories-Applications
- Survey
- Benchmark/Evaluation
- Advanced Methods
- Application of Model Merging in Foundation Models
- Application of Model Merging in Different Machine Learning Subfields
- Other Applications
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | 2024 | Arxiv | LLaMA3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct-v0.3, |
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | 2024 | NeurIPS Track on Datasets and Benchmarks | Synthia-7B-v1.2, Llama-2-7b-evolcodealpaca, OpenHermes-7B, pygmalion-2-7b, Llama-2-7b-chat-hf, BeingWell_llama2_7b, MetaMath-7B-V1.0, vicuna-7b-v1.5, Platypus2-7B, GOAT-7B-Community, Llama-2-7b-WikiChat-fused, dolphin-llama2-7b, MetaMath-Llemma-7B, CodeLlama-7b-Instruct-hf, Magicoder-S-CL-7B , CrystalChat |
What Matters for Model Merging at Scale? | 2024 | Arxiv | PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B) |
Realistic Evaluation of Model Merging for Compositional Generalization | 2024 | Arxiv | |
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities | 2024 | Arxiv | Llama-3.1-8B, Mistral-7B-v0.3 |
FusionBench: A Comprehensive Benchmark of Deep Model Fusion | 2024 | Arxiv | |
Arcee's MergeKit: A Toolkit for Merging Large Language Models | 2024 | Arxiv | Llama2-7B-Chat, Meditron-7B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | 2024 | Arxiv | |
Tangent Transformers for Composition,Privacy and Removal | 2024 | ICLR | |
Parameter Efficient Multi-task Model Fusion with Partial Linearization | 2024 | ICLR | |
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Efficient Model Editing with Task-Localized Sparse Fine-tuning | 2024 |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Knowledge fusion of large language models | 2024 | ICLR | Llama-2 7B, OpenLLaMA 7B, MPT 7B |
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report | 2024 | Arxiv | NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B |
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks | 2023 | ICASSP | |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Composing parameter-efficient modules with arithmetic operation | 2023 | NeurIPS | |
Editing models with task arithmetic | 2023 | ICLR | |
Model fusion via optimal transport | 2020 | NeurIPS | |
Weight averaging for neural networks and local resampling schemes | 1996 | AAAI Workshop | |
Animating rotation with quaternion curves (Spherical Linear Interpolation (SLERP) Model Merging) | 1985 | SIGGRAPH Computer Graphics |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Rethink the Evaluation Protocol of Model Merging on Classification Task | 2024 | Arxiv | |
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery | 2024 | Arxiv | |
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
How to Merge Your Multimodal Models Over Time? | 2024 | Arxiv | |
Multi-Task Model Merging via Adaptive Weight Disentanglement | 2024 | Arxiv | |
Rethinking Weight-Averaged Model-merging | 2024 | Arxiv | |
ATM: Improving Model Merging by Alternating Tuning and Merging | 2024 | Arxiv | |
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | 2024 | Arxiv | Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B |
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging | 2024 | Arxiv | |
Itās Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization | 2024 | Arxiv | Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | 2023 | Arxiv | SOLAR 10.7B, SOLAR 10.7B-Instruct |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach | 2024 | Arxiv | |
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | 2024 | AAAI | LLaMA-7B |
Mitigating Social Biases in Language Models through Unlearning | 2024 | Arxiv | LLaMA-2 7B |
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models | 2024 | Arxiv | Llama-2-7B, Llama-2-chat-7B, Vicuna-7B, Llama-2-13B |
Composing Parameter-Efficient Modules with Arithmetic Operation | 2023 | NeurIPS | |
Editing models with task arithmetic | 2023 | ICLR | |
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
NegMerge: Consensual Weight Negation for Strong Machine Unlearning | 2024 | Arxiv | |
Towards Safer Large Language Models through Machine Unlearning | 2024 | ACL | LLAMA2-7B, LLAMA2-13B |
Editing models with task arithmetic | 2023 | ICLR | |
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model | 2023 | Arxiv | LLAMA2-7B, LLAMA-7B, BLOOM-7B |
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv | OpenLLaMA 7B and 13B |
Checkpoint Merging via Bayesian Optimization in LLM Pretraining | 2024 | Arxiv | Baichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B |
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning | 2023 | ACL | |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | 2023 | NeurIPS Workshop | |
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging | 2022 | NeurIPS Workshop | |
Fusing finetuned models for better pretraining | 2022 | Arxiv |
Note: The following papers are from: LLM Merging Competition at NeurIPS 2024
Paper Title | Year | Conference/Journal | Models |
---|---|---|---|
Llm merging: Building llms efficiently through merging | 2024 | LLM Merging Competition at NeurIPS | - |
Towards an approach combining Knowledge Graphs and Prompt Engineering for Merging Large Language Models | 2024 | LLM Merging Competition at NeurIPS | meta-llama/Llama-2-7b; microsoft_phi1/2/3 |
Model Merging using Geometric Median of Task Vectors | 2024 | LLM Merging Competition at NeurIPS | flan_t5_xl |
Interpolated Layer-Wise Merging for NeurIPS 2024 LLM Merging Competition | 2024 | LLM Merging Competition at NeurIPS | suzume-llama-3-8B-multilingual-orpo-borda-top75, Barcenas-Llama3-8bORPO, Llama-3-8B-Ultra-Instruct-SaltSprinkle, MAmmoTH2-8B-Plus, Daredevil-8B |
A Model Merging Method | 2024 | LLM Merging Competition at NeurIPS | - |
Differentiable DARE-TIES for NeurIPS 2024 LLM Merging Competition | 2024 | LLM Merging Competition at NeurIPS | suzume-llama-3-8B-multilingualorpo-borda-top75, MAmmoTH2-8B-Plus and Llama-3-Refueled |
LLM Merging Competition Technical Report: Efficient Model Merging with Strategic Model Selection, Merging, and Hyperparameter Optimization | 2024 | LLM Merging Competition at NeurIPS | MaziyarPanahi/Llama3-8B-Instruct-v0.8, MaziyarPanahi/Llama-3-8B-Instruct-v0.9, shenzhiwang/Llama3-8B-Chinese-Chat, lightblue/suzume-llama-3-8B-multilingual |
Simple Llama Merge: What Kind of LLM Do We Need? | 2024 | LLM Merging Competition at NeurIPS | Hermes-2-Pro-Llama-3-8B, and Daredevil-8B |
LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging | 2024 | LLM Merging Competition at NeurIPS | Mistral-7B-Instruct94 v2, Llama3-8B-Instruct, Flan-T5-large, Gemma-7B-Instruct, and WizardLM-2-7B |
MoD: A Distribution-Based Approach for Merging Large Language Models | 2024 | LLM Merging Competition at NeurIPS | Qwen2.5-1.5B and Qwen2.5-7B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Jointly training large autoregressive multimodal models | 2024 | ICLR | |
Model Composition for Multimodal Large Language Models | 2024 | ACL | Vicuna-7B-v1.5 |
Ļ-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation | 2023 | ICML | |
An Empirical Study of Multimodal Model Merging | 2023 | EMNLP | |
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks | 2023 | TMLR |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification | 2024 | ICASSP Workshop |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | 2024 | Arxiv | LLaVA-Critic 7b |
IterIS: Iterative Inference-Solving Alignment for LoRA Merging | 2024 | Arxiv | |
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | 2024 | ECCV | |
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models | 2024 | Arxiv | |
MoLE: Mixture of LoRA Experts | 2024 | ICLR | |
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | 2024 | Arxiv | |
Multi-LoRA Composition for Image Generation | 2024 | Arxiv | |
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models | 2023 | NeurIPS | |
Merging loras | 2023 | (github) | |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | 2023 | Arxiv | |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better | 2024 | Arxiv | |
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA | 2024 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Decouple-Then-Merge: Towards Better Training for Diffusion Models | 2024 | Arxiv | |
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data | 2024 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging | 2024 | Arxiv | |
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | 2024 | Arxiv | |
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation | 2024 | Arxiv | Llama3-8B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv | OpenLLaMA-7B, OpenLLaMA-13B |
Merging Vision Transformers from Different Tasks and Domains | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Realistic Evaluation of Model Merging for Compositional Generalization | 2024 | Arxiv | |
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks | 2024 | Arxiv | |
Training-Free Model Merging for Multi-target Domain Adaptation | 2024 | Arxiv | |
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation | 2024 | Arxiv | Llama3-70B |
Ensemble of averages: Improving model selection and boosting performance in domain generalization | 2022 | NeurIPS | |
Swad: Domain generalization by seeking flat minima | 2021 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks | 2024 | ACL | Llama-2- 7B |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | 2024 | COLM | Llama-2-7B, Llama-2-13B |
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild | 2024 | ACL | |
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? | 2024 | Arxiv | |
MerA: Merging pretrained adapters for few-shot learning | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoBAM: LoRA-Based Backdoor Attack on Model Merging | 2024 | Arxiv | |
BadMerging: Backdoor Attacks Against Model Merging | 2024 | CCS | |
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario | 2024 | ACL | Llama-2-7B |
Star History
We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.
If you have a related paper that was not added to the library, please contact us.
Email: [email protected] / [email protected]