Evolutionary Optimization of Model Merging Recipes

llm
research paper
Author

Santosh Sawant

Published

March 21, 2024

Model merging offers a novel approach to leverage the strengths of multiple pre-trained models. It allows us to combine task-specific models, each potentially fine-tuned for a particular downstream task, into a single unified model. Model merging works surprisingly well and produced many state-of-the-art models on the Open LLM Leaderboard (https://lnkd.in/g3b7eZpm).

There are multiple widely used model merge algorithm such as Spherical Linear Interpolation (SLERP), TIES-Merging and DARE each one has its own advantage over one another. Recently, SakanaAI has released a new model merge method called Evolutionary Model Merge, a general method that uses evolutionary techniques to efficiently discover the best ways to combine different models from the vast ocean of different open-source models with diverse capabilities. Evolutionary Model Merge approach encompasses (1) evolving the weights for mixing parameters at each layer in the parameter space (PS); (2) evolving layer permutations in the data flow space (DFS); and (3) an integrated strategy that combines both methods for merging in both PS and DFS. Notice that merging in the PS is not simply copying and stitching of the layers parameters, but also mixes the weights.

During experimentation, the Evolutionary Model merge method was used to automatically evolve for a Japanese Large Language Model (LLM) capable of Math reasoning, and a Japanese Vision-Language Model (VLM). Surprisingly, both models achieve state-of-the-art results on several LLM and Vision benchmarks, while not being explicitly optimized to be good at these benchmarks! EvoLLM-JP-A-v1-7B achieved 52.0%, outperforming individual models Shisa Gamma 7B v1 (9.6%) Abel 7B 002 (30.0%) where as EvoVLM-JP-v1-7B achieved 51.25, outperforming LLaVA-1.6-Mistral-7B (41.10).

Paper : https://arxiv.org/pdf/2403.13187.pdf

Model : https://huggingface.co/SakanaAI

Demo : https://huggingface.co/spaces/SakanaAI/EvoVLM-JP