model-merging
Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.
What this skill does
# Model Merging: Combining Pre-trained Models
## When to Use This Skill
Use Model Merging when you need to:
- **Combine capabilities** from multiple fine-tuned models without retraining
- **Create specialized models** by blending domain-specific expertise (math + coding + chat)
- **Improve performance** beyond single models (often +5-10% on benchmarks)
- **Reduce training costs** - no GPUs needed, merges run on CPU
- **Experiment rapidly** - create new model variants in minutes, not days
- **Preserve multiple skills** - merge without catastrophic forgetting
**Success Stories**: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
**Tools**: mergekit (Arcee AI), LazyMergekit, Model Soup
## Installation
```bash
# Install mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
# Or via pip
pip install mergekit
# Optional: Transformer library
pip install transformers torch
```
## Quick Start
### Simple Linear Merge
```yaml
# config.yml - Merge two models with equal weights
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.5
dtype: bfloat16
```
```bash
# Run merge
mergekit-yaml config.yml ./merged-model --cuda
# Use merged model
python -m transformers.models.auto --model_name_or_path ./merged-model
```
### SLERP Merge (Best for 2 Models)
```yaml
# config.yml - Spherical interpolation
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # Interpolation factor (0=model1, 1=model2)
dtype: bfloat16
```
## Core Concepts
### 1. Merge Methods
**Linear (Model Soup)**
- Simple weighted average of parameters
- Fast, works well for similar models
- Can merge 2+ models
```python
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
# where w1 + w2 + w3 = 1
```
**SLERP (Spherical Linear Interpolation)**
- Interpolates along sphere in weight space
- Preserves magnitude of weight vectors
- Best for merging 2 models
- Smoother than linear
```python
# SLERP formula
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
# where θ = arccos(dot(model1, model2))
# t ∈ [0, 1]
```
**Task Arithmetic**
- Extract "task vectors" (fine-tuned - base)
- Combine task vectors, add to base
- Good for merging multiple specialized models
```python
# Task vector
task_vector = finetuned_model - base_model
# Merge multiple task vectors
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂
```
**TIES-Merging**
- Task arithmetic + sparsification
- Resolves sign conflicts in parameters
- Best for merging many task-specific models
**DARE (Drop And REscale)**
- Randomly drops fine-tuned parameters
- Rescales remaining parameters
- Reduces redundancy, maintains performance
### 2. Configuration Structure
```yaml
# Basic structure
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # Optional: base model for task arithmetic
models:
- model: <path/to/model1>
parameters:
weight: <float> # Merge weight
density: <float> # For TIES/DARE
- model: <path/to/model2>
parameters:
weight: <float>
parameters:
# Method-specific parameters
dtype: <dtype> # bfloat16, float16, float32
# Optional
slices: # Layer-wise merging
tokenizer: # Tokenizer configuration
```
## Merge Methods Guide
### Linear Merge
**Best for**: Simple model combinations, equal weighting
```yaml
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16
```
### SLERP Merge
**Best for**: Two models, smooth interpolation
```yaml
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = first model, 1.0 = second model
dtype: bfloat16
```
**Layer-specific SLERP:**
```yaml
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # Attention layers
value: 0.3
- filter: mlp # MLP layers
value: 0.7
- value: 0.5 # Default for other layers
dtype: bfloat16
```
### Task Arithmetic
**Best for**: Combining specialized skills
```yaml
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # Math
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # Chat
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # Code
parameters:
weight: 0.2
dtype: bfloat16
```
### TIES-Merging
**Best for**: Many models, resolving conflicts
```yaml
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Keep top 50% of parameters
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16
```
### DARE Merge
**Best for**: Reducing redundancy
```yaml
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Drop 50% of deltas
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # Use int8 for masks (saves memory)
dtype: bfloat16
```
## Advanced Patterns
### Layer-wise Merging
```yaml
# Different models for different layers
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 16] # First half
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [16, 32] # Second half
dtype: bfloat16
```
### MoE from Merged Models
```yaml
# Create Mixture of Experts
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python"
dtype: bfloat16
```
### Tokenizer Merging
```yaml
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # Combine vocabularies from both models
tokens:
<|special_token|>:
source: "custom/specialized-model"
```
## Best Practices
### 1. Model Compatibility
```python
# ✅ Good: Same architecture
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
]
# ❌ Bad: Different architectures
models = [
"meta-llama/Llama-2-7b-hf", # Llama
"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
]
```
### 2. Weight Selection
```yaml
# ✅ Good: Weights sum to 1.0
models:
- model: model_a
parameters:
weight: 0.6
- model: model_b
parameters:
weight: 0.4 # 0.6 + 0.4 = 1.0
# ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
models:
- model: model_a
parameters:
weight: 0.8
- model: model_b
parameters:
weight: 0.8 # May boost performance
```
### 3. Method Selection
```python
# Choose merge method based on Related in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard — shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.