Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -848,6 +848,7 @@
"weave/guides/integrations/autogen",
"weave/guides/integrations/verdict",
"weave/guides/integrations/verifiers",
"weave/guides/integrations/trl",
"weave/guides/integrations/js"
]
},
Expand Down
Binary file added images/weave/trl.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions weave/guides/integrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,9 @@ Frameworks help orchestrate the actual execution pipelines in AI applications. T
- **[Koog](/weave/guides/integrations/koog)**

## RL Frameworks

- **[Verifiers](/weave/guides/integrations/verifiers)**
- **[trl](/weave/guides/integrations/trl)**

## Protocols

Expand Down
90 changes: 90 additions & 0 deletions weave/guides/integrations/trl.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: Transformer Reinforcement Learning (TRL)
description: "Train Large Language Models (LLMs) and Large Reasoning Models (LRMs) in TRL using the weave callback together with W&B to get a better glimpse of training progress. Log completions or traces and track changes in the quality of responses generated by the model."
---

![WeaveCallback](/images/weave/trl.gif)

[Transformer Reinforcement Learning (TRL)](https://huggingface.co/docs/trl/en/index)
TRL is a full stack library that provides a set of tools to train transformer language models with methods such as Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers. Along with using W&B to record your training metrics, you can integrate Weave with your TRL workflows to gain observability into how your model performs
during training. Weave records inputs, outputs, and timestamps for each evaluation step so you can inspect the quality of responses generated by the model.

This guide shows you how to use TRL with Weave and W&B.

### Getting started

Install TRL with `uv` or `pip`. Choose one of the options below based on your setup.

```bash
# Using uv
uv pip install trl

# Using pip
pip install trl
```

Then install Weave and W&B using any of the options below based on your setup:

```bash
# Using uv
uv pip install weave wandb

# Using pip
pip install weave wandb
```


### Training models with TRL and logging traces/completions using Weave

Once you have installed the necessary libraries, you can use the built-in [WeaveCallback](https://huggingface.co/docs/trl/main/en/callbacks#trl.WeaveCallback) in
TRL to log traces and completions at each evaluation step. The callback logs data during evaluation phases, so you need to pass an evaluation set to the
trainer object. The following example script demonstrates how to run an evaluation with TRL and log the results to Weave when training a model with
the GRPOTrainer.


Run the example and inspect the results in Weave:

```python lines

import os
os.environ["WANDB_API_KEY"] = "<YOUR-WANDB-API-KEY>"
os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"

import wandb
import weave
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer, WeaveCallback

# Log in to W&B
wandb.login()

# Load the datasets
train_dataset, eval_dataset = load_dataset("trl-lib/ultrafeedback-prompt", split=['train[:5%]', 'test[:5%]'])

# Dummy reward function for demonstration purposes
def reward_num_unique_letters(completions, **kwargs):
"""Reward function that rewards completions with more unique letters."""
completion_contents = [completion[0]["content"] for completion in completions]
return [float(len(set(content))) for content in completion_contents]

training_args = GRPOConfig(output_dir="Qwen2-0.5B-GRPO")
trainer = GRPOTrainer(
model="Qwen/Qwen2-0.5B-Instruct",
reward_funcs=reward_num_unique_letters,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
weave_callback = WeaveCallback(trainer=trainer, ...)
trainer.add_callback(weave_callback)

trainer.train()
```


### Resources

Here are some resources you can use to learn how to integrate Weave with different workflows:

1. [WeaveCallback documentation](https://huggingface.co/docs/trl/main/en/callbacks#trl.WeaveCallback)
2. Curated [examples](https://github.com/wandb/rl_examples) that show how to use Weave with TRL for different algorithms.