Skip to content

AidaAfshar/LLM_ModelSelection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Model Selection for GRPO + LoRA

License: MIT Python 3.10 PyTorch 2.9 Hugging Face Unsloth

This repository studies online model selection methods for post training LLMs with GRPO+LoRA adapters. The pipeline trains multiple base models ,each with unique configuration, and uses a meta-selection strategy to choose which base model to train at each episode, then logs training and reward behavior for comparison across selection methods. For memory efficiency, each base model is identified with a unique LoRA adaptor and the pretrained model, in this case "meta-llama/Llama-3.2-3B-Instruct", is shared.

Folder Structure



+├── modsel_GRPO_LoRA/                      # Main training pipeline
+│   ├── dataset.py
+│   ├── main.py
+│   ├── reward_funcs.py
+│   ├── utils.py
+└── model_selection_algorithms/            # Model Selection + Bandit implementations
+    ├── bandit_algs.py
+    ├── modsel_algs.py

Acknowledgment

This codebase partially reuses GRPO implementation from Hugging Face TRL and Reasoning RL codebase from Unsloth AI.

About

Online Model Selection for Post Training Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages