PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

📖 Installation

Important

This repository relies on two major frameworks: LLaMA-Factory (for supervised fine-tuning) and verl (for reinforcement learning–based fine-tuning).

We used Anaconda3-2023.09-0-Linux-x86_64.sh to create the base conda environment.

To ensure reproducibility and avoid dependency conflicts, please first attempt installation via the provided environment files (llama.yml and verl.yml under folder envs).

If you encounter issues with the YAML-based setup, you may instead follow the manual installation procedure detailed below and check yml files for reference.

Environment Setup

We strongly recommend creating two separate conda environments, one for each framework, in order to avoid dependency conflicts and to guarantee consistent experimental results.

Option A. Install from YAML configuration

# LLaMA-Factory environment
conda env create -f llama.yml
conda activate llama_factory


# verl environment
conda env create -f verl.yml
conda activate verl

Option B. Installation from source

If the YAML-based setup fails, please follow the instructions below to configure the environments.

(a) LLaMA-Factory

# create and enter environment (Python ≥3.10 recommended)
conda create -n llama_factory python=3.10 -y
conda activate llama_factory

# clone LLaMA-Factory repository 
git clone https://github.com/hiyouga/LLaMA-Factory
# enter the directory where LLaMA-Factory is downloaded
cd /path/to/LLaMA-Factory
# install dependencies
pip install -e ".[torch,metrics]"

(b) verl

# create and enter environment (Python ≥3.10 recommended)
conda create -n verl python==3.10
conda activate verl

# install CUDA toolkit and dependencies
conda install nvidia/label/cuda-12.4.1::cuda-toolkit
conda install nvidia/label/cudnn-9.8.0::cudnn
conda install conda-forge::nvidia-apex

# clone verl repository
git clone https://github.com/volcengine/verl.git
# enter the directory where verl is downloaded
cd /path/to/verl

# install vLLM, SGLang, and Megatron Core components
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh

# install Python dependencies
pip install pyarrow==15.0.0 datasets==4.0.0
pip install accelerate==1.9.0 codetiming==1.4.0 hydra-core==1.3.2 peft==0.16.0 pybind11==3.0.0 pylatexenc==2.10 ray==2.47.1 tensordict==0.7.2 torchdata==0.11.0 transformers==4.53.2 wandb==0.21.0 pyzmq==27.0.0
pip install sentencepiece==0.2.0
pip install xformers==0.0.29.post2
pip install vllm==0.8.5.post1
pip install numpy==1.26.4

For further environment issues, please check the repos of the two frameworks for solutions.

🚀 Model Training Details and Instructions

Reinforcement Learning (RL) data preparation

For RL training, the input dataset needs to be transformed into .parquet format. Here we listed an example about how we did this process and users can find our customized process .py file (prep_mol_data_consider_cot.py) in this repo:

python verl-main/scripts/prep_mol_data_consider_cot.py \
    --train_jsonl /path/to/RL_train.jsonl \
    --test_jsonl  /path/to/RL_test.jsonl \
    --out_dir     /path/to/output_dir

Training Details

We summarize the training hyperparameters below for reproducibility.

SFT Training

Cutoff length: 2048 tokens
Epochs: 3
Learning rate: 5 × 10⁻⁵
Batch size: 4
Gradient accumulation steps: 8
Learning rate schedule: cosine decay with 20 warmup steps
Maximum training samples: 26,000 (10% reserved for validation)
Checkpointing: every 50 steps
Logging: every 5 steps
LoRA configuration: rank=8, scaling=16, dropout=0.05
Training framework: LLaMA-Factory

RL Training

Training batch size: 128
Rollouts per sample: 8
Sampling temperature: 1.0
Initial learning rate: 1 × 10⁻⁶
KL divergence coefficient in GRPO loss: 1 × 10⁻³
Training framework: verl

Hardware

All experiments were conducted on 8×NVIDIA H100 80GB GPUs.

📊 Dataset Construction

Our project leverages the publicly available dataset CycPeptMPDB as the training data for supervised fine-tuning (SFT) and reinforcement learning (RL) experiments. Implementation details are available in cycpepMPDB.ipynb.

🔬 Data Augmentation with Cyclops

To enrich the diversity of peptide samples, we utilized the CycloPs toolkit to construct our custom dataset:

Monomer database construction: We first build a monomer library by including both natural and non-natural amino acids.
Random substitution: For each peptide in the raw CycPeptMPDB dataset, we perform 100 rounds of random monomer replacement to generate structurally diverse variants.
Cyclization strategy: The modified sequences are formatted in head-to-tail cyclization mode within CycloPs to produce valid cyclic peptide structures.

This process yields a significantly expanded training corpus that better supports peptide property prediction tasks.

We make two datasets generated from our data preparation process and used for SFT training and evaluation available in folder Dataset:

Training set: CycPeptMPDB_SFT_train_LogD_MRT_SIF.jsonl
Test set: CycPeptMPDB_SFT_test_LogD_MRT_SIF_1880.jsonl

⚗️ Molecular Property Prediction

For property prediction, we employe the Chemprop package, a state-of-the-art message passing neural network (MPNN)–based toolkit.
Chemprop is used to calculate key properties of the generated cyclic peptides, which are then incorporated as part of the supervised training signal.

📑 Evaluation and Visualization

To run evaluation on the trained models:

Generate model predictions via the Evaluate & Predict module of LLaMA-Factory.
Export the predictions as a .csv file.
Process the output using the evaluation methods in cycpepMPDB.ipynb.

Acknowledgement

This implementation is built upon verl and LLaMA-Factory. We sincerely appreciate the efforts of these teams for their contributions to open-source research.

📖 Reference

If you find this work useful in your research, please consider citing our paper:

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

BibTeX

@article{wang2025pepthinkr1,
  title     = {PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning},
  author    = {Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang},
  journal   = {arXiv preprint arXiv:2508.14765},
  year      = {2025}
}

Note that Ruheng Wang, Hang Zhang, and Trieu Nguyen did this work as summer interns at Merck & Co., Inc., Rahway, NJ, USA in 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dataset		Dataset
envs		envs
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
LICENSES_THIRD_PARTY.txt		LICENSES_THIRD_PARTY.txt
README.md		README.md
cycpepMPDB.ipynb		cycpepMPDB.ipynb
prep_mol_data_consider_cot.py		prep_mol_data_consider_cot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

📖 Installation

Environment Setup

Option A. Install from YAML configuration

Option B. Installation from source

🚀 Model Training Details and Instructions

Reinforcement Learning (RL) data preparation

Training Details

📊 Dataset Construction

🔬 Data Augmentation with Cyclops

⚗️ Molecular Property Prediction

📑 Evaluation and Visualization

Acknowledgement

📖 Reference

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

📖 Installation

Environment Setup

Option A. Install from YAML configuration

Option B. Installation from source

🚀 Model Training Details and Instructions

Reinforcement Learning (RL) data preparation

Training Details

📊 Dataset Construction

🔬 Data Augmentation with Cyclops

⚗️ Molecular Property Prediction

📑 Evaluation and Visualization

Acknowledgement

📖 Reference

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages