Skip to content

kaganhitit11/mergeval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mergeval 🧩

mergeval is a unified tool that lets you merge and evaluate large language models in one step.
It combines the power of mergekit for model merging and lm-eval-harness for standardized benchmarking — all through a single command or API call.

Features

  • 🔄 Merge multiple finetuned models into one using all supported merging methods of mergekit
  • 🧪 Evaluate merged models on all supported benchmarks of lm-eval-harness (MMLU, ARC, HellaSwag, etc.)
  • ⚙️ Single CLI command to run both merge + eval

📦 Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended for model merging and evaluation)

Install Dependencies

pip install -r requirements.txt

This installs:

  • mergekit — Model merging toolkit
  • lm-evaluation-harness — Evaluation framework

and any other helper library.

🚀 Quick Start

Basic Usage

Run merge and evaluation in one command:

python mergeval.py examples/example.yaml

This will:

  1. Merge models according to the MergeKit configuration
  2. Evaluate the merged model on specified benchmarks
  3. Clean up the temporary merged model directory

Configuration File

The YAML configuration file supports two main sections:

  • merge — mergekit configuration for model merging
  • evaluate — lm-evaluation-harness configuration for benchmarking

You can include both sections, or just one if you only want to merge or evaluate.

📝 Configuration Format

Merge Section

The merge section configures model merging:

merge:
  # Option 1: Reference an external MergeKit config file
  config_path: path/to/mergekit_config.yaml
  
  # Option 2: Use inline mergekit configuration
  config:
    models:
      - model: model1/model-name
        parameters:
          density: 0.3
      - model: model2/model-name
        parameters:
          density: 0.3
    merge_method: ties  # or task_arithmetic, slerp, etc.
    base_model: base/model-name
    parameters:
      normalize: true
      int8_mask: true
    dtype: float16
  
  # Output directory (optional, defaults to merged_models/merged_TIMESTAMP)
  output_model_dir: /path/to/output
  
  # Extra mergekit CLI arguments (optional)
  extra_args:
    - out_shard_size: 2B
    - cuda
    - allow-crimes

Merge Configuration Options:

  • config_path: Path to an external mergekit YAML config file
  • config: Inline mergekit configuration dictionary
  • output_model_dir: Where to save the merged model (optional)
  • extra_args: Additional CLI arguments for mergekit-

Note: Provide either config_path or config, not both.

Evaluate Section

The evaluate section configures benchmarking:

evaluate:
  config:
    # Model configuration
    model: hf  # Model type (hf, vllm, etc.)
    model_args:
      - pretrained: /path/to/model
      - dtype: bfloat16
      - trust_remote_code: true
      - device_map: auto
      - load_in_8bit: false
    
    # Task configuration
    tasks:  # List of tasks or comma-separated string
      - arc_easy
      - arc_challenge
      - hellaswag
      - mmlu_abstract_algebra
    num_fewshot: 3  # Number of few-shot examples (0 for zero-shot)
    
    # Generation configuration
    gen_kwargs:
      - temperature: 0.7
      - top_p: 0.9
      - top_k: 40
      - max_new_tokens: 100
      - do_sample: true
    
    # Hardware settings
    device: cuda:0  # cuda, cuda:0, cpu, mps
    batch_size: auto  # Integer or 'auto'
    max_batch_size: 64
    
    # Output configuration
    output_path: /path/to/results.json
    log_samples: true  # Save per-document outputs
    limit: 0.5  # Evaluate only 50% of documents per task
    
    # Caching
    use_cache: /path/to/sqlite_cache
    cache_requests: true  # true, refresh, or delete
    
    # Debug options
    check_integrity: true
    write_out: true
    show_config: true
    
    # Custom tasks
    include_path: /path/to/custom/tasks
    
    # Chat and prompts
    system_instruction: "You are a helpful AI assistant."
    apply_chat_template: claude-v1  # Template name or true
    fewshot_as_multiturn: true
    predict_only: false
    
    # Random seed
    seed:
      - random: 0
      - numpy: 1234
      - torch: 1234
    
    # Logging
    wandb_args:
      - project: my-project
      - name: my-run
    hf_hub_log_args:
      - hub_results_org: MyOrg
      - push_results_to_hub: true
    
    # Custom metadata
    metadata:
      custom_key: custom_value

Required Fields:

  • model: Model type identifier
  • tasks: List of evaluation tasks
  • model_args: Model loading arguments (at minimum, pretrained path)

Common Tasks:

  • arc_easy, arc_challenge — AI2 Reasoning Challenge
  • hellaswag — Commonsense reasoning
  • mmlu_* — Massive Multitask Language Understanding (subject-specific)
  • winogrande — Commonsense reasoning
  • gsm8k — Grade school math
  • See lm-evaluation-harness tasks for full list

💻 Usage Examples

Example 1: Merge and Evaluate

merge:
  config:
    models:
      - model: psmathur/orca_mini_v3_13b
        parameters:
          density: 0.3
      - model: garage-bAInd/Platypus2-13B
        parameters:
          density: 0.3
    merge_method: ties
    base_model: TheBloke/Llama-2-13B-fp16
    parameters:
      normalize: true
      int8_mask: true
    dtype: float16

evaluate:
  config:
    model: hf
    model_args:
      - pretrained: merged_models/merged_20240101_120000
      - dtype: bfloat16
      - device_map: auto
    tasks:
      - arc_easy
      - hellaswag
      - mmlu
    num_fewshot: 0
    output_path: results.json

Example 2: Merge Only

merge:
  config_path: my_merge_config.yaml
  output_model_dir: /path/to/save/merged_model

Example 3: Evaluate Only

evaluate:
  config:
    model: hf
    model_args:
      - pretrained: /path/to/existing/model
      - device_map: auto
    tasks: mmlu
    output_path: eval_results.json

Example 4: Using External Config File

merge:
  config_path: configs/ties_merge.yaml
  subspace_boosting: true

evaluate:
  config:
    model: hf
    model_args:
      - pretrained: auto  # Will use merged model path
    tasks: arc_challenge,hellaswag

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

See LICENSE file for details.

🙏 Acknowledgments

About

mergeval is a unified tool that lets you merge and evaluate large language models in one step. It combines the power of mergekit (https://github.com/arcee-ai/mergekit) for model merging and lm-eval-harness (https://github.com/EleutherAI/lm-evaluation-harness) for standardized benchmarking — all through a single command.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages