Skip to content

ZZP21/PepThink-R1

 
 

Repository files navigation

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

📖 Installation

Important

  • This repository relies on two major frameworks: LLaMA-Factory (for supervised fine-tuning) and verl (for reinforcement learning–based fine-tuning).
  • We used Anaconda3-2023.09-0-Linux-x86_64.sh to create the base conda environment.
  • To ensure reproducibility and avoid dependency conflicts, please first attempt installation via the provided environment files (llama.yml and verl.yml under folder envs).
  • If you encounter issues with the YAML-based setup, you may instead follow the manual installation procedure detailed below and check yml files for reference.

Environment Setup

We strongly recommend creating two separate conda environments, one for each framework, in order to avoid dependency conflicts and to guarantee consistent experimental results.

Option A. Install from YAML configuration

# LLaMA-Factory environment
conda env create -f llama.yml
conda activate llama_factory


# verl environment
conda env create -f verl.yml
conda activate verl

Option B. Installation from source

If the YAML-based setup fails, please follow the instructions below to configure the environments.

(a) LLaMA-Factory

# create and enter environment (Python ≥3.10 recommended)
conda create -n llama_factory python=3.10 -y
conda activate llama_factory

# clone LLaMA-Factory repository 
git clone https://github.com/hiyouga/LLaMA-Factory
# enter the directory where LLaMA-Factory is downloaded
cd /path/to/LLaMA-Factory
# install dependencies
pip install -e ".[torch,metrics]"

(b) verl

# create and enter environment (Python ≥3.10 recommended)
conda create -n verl python==3.10
conda activate verl

# install CUDA toolkit and dependencies
conda install nvidia/label/cuda-12.4.1::cuda-toolkit
conda install nvidia/label/cudnn-9.8.0::cudnn
conda install conda-forge::nvidia-apex

# clone verl repository
git clone https://github.com/volcengine/verl.git
# enter the directory where verl is downloaded
cd /path/to/verl

# install vLLM, SGLang, and Megatron Core components
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh

# install Python dependencies
pip install pyarrow==15.0.0 datasets==4.0.0
pip install accelerate==1.9.0 codetiming==1.4.0 hydra-core==1.3.2 peft==0.16.0 pybind11==3.0.0 pylatexenc==2.10 ray==2.47.1 tensordict==0.7.2 torchdata==0.11.0 transformers==4.53.2 wandb==0.21.0 pyzmq==27.0.0
pip install sentencepiece==0.2.0
pip install xformers==0.0.29.post2
pip install vllm==0.8.5.post1
pip install numpy==1.26.4

For further environment issues, please check the repos of the two frameworks for solutions.


🚀 Model Training Details and Instructions

Reinforcement Learning (RL) data preparation

For RL training, the input dataset needs to be transformed into .parquet format. Here we listed an example about how we did this process and users can find our customized process .py file (prep_mol_data_consider_cot.py) in this repo:

python verl-main/scripts/prep_mol_data_consider_cot.py \
    --train_jsonl /path/to/RL_train.jsonl \
    --test_jsonl  /path/to/RL_test.jsonl \
    --out_dir     /path/to/output_dir

Training Details

We summarize the training hyperparameters below for reproducibility.

SFT Training

Cutoff length: 2048 tokens
Epochs: 3
Learning rate: 5 × 10⁻⁵
Batch size: 4
Gradient accumulation steps: 8
Learning rate schedule: cosine decay with 20 warmup steps
Maximum training samples: 26,000 (10% reserved for validation)
Checkpointing: every 50 steps
Logging: every 5 steps
LoRA configuration: rank=8, scaling=16, dropout=0.05
Training framework: LLaMA-Factory

RL Training

Training batch size: 128
Rollouts per sample: 8
Sampling temperature: 1.0
Initial learning rate: 1 × 10⁻⁶
KL divergence coefficient in GRPO loss: 1 × 10⁻³
Training framework: verl

Hardware

All experiments were conducted on 8×NVIDIA H100 80GB GPUs.


📊 Dataset Construction

Our project leverages the publicly available dataset CycPeptMPDB as the training data for supervised fine-tuning (SFT) and reinforcement learning (RL) experiments. Implementation details are available in cycpepMPDB.ipynb.

🔬 Data Augmentation with Cyclops

To enrich the diversity of peptide samples, we utilized the CycloPs toolkit to construct our custom dataset:

  1. Monomer database construction: We first build a monomer library by including both natural and non-natural amino acids.
  2. Random substitution: For each peptide in the raw CycPeptMPDB dataset, we perform 100 rounds of random monomer replacement to generate structurally diverse variants.
  3. Cyclization strategy: The modified sequences are formatted in head-to-tail cyclization mode within CycloPs to produce valid cyclic peptide structures.

This process yields a significantly expanded training corpus that better supports peptide property prediction tasks.


We make two datasets generated from our data preparation process and used for SFT training and evaluation available in folder Dataset:

  • Training set: CycPeptMPDB_SFT_train_LogD_MRT_SIF.jsonl
  • Test set: CycPeptMPDB_SFT_test_LogD_MRT_SIF_1880.jsonl

⚗️ Molecular Property Prediction

For property prediction, we employe the Chemprop package, a state-of-the-art message passing neural network (MPNN)–based toolkit.
Chemprop is used to calculate key properties of the generated cyclic peptides, which are then incorporated as part of the supervised training signal.


📑 Evaluation and Visualization

To run evaluation on the trained models:

  1. Generate model predictions via the Evaluate & Predict module of LLaMA-Factory.
  2. Export the predictions as a .csv file.
  3. Process the output using the evaluation methods in cycpepMPDB.ipynb.

Acknowledgement

This implementation is built upon verl and LLaMA-Factory. We sincerely appreciate the efforts of these teams for their contributions to open-source research.


📖 Reference

If you find this work useful in your research, please consider citing our paper:

BibTeX

@article{wang2025pepthinkr1,
  title     = {PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning},
  author    = {Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang},
  journal   = {arXiv preprint arXiv:2508.14765},
  year      = {2025}
}

Note that Ruheng Wang, Hang Zhang, and Trieu Nguyen did this work as summer interns at Merck & Co., Inc., Rahway, NJ, USA in 2025.

About

LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.0%
  • Python 2.0%