TRL integration doc #1890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

AakashKumarNain wants to merge 7 commits into main from trl_integration_doc

docs.json

-Original file line number
+Diff line change
@@ Expand Up / @@ -848,6 +848,7 @@ @@
                               "weave/guides/integrations/autogen",
                               "weave/guides/integrations/verdict",
                               "weave/guides/integrations/verifiers",
+                              "weave/guides/integrations/trl",
                               "weave/guides/integrations/js"
                             ]
                           },
@@ Expand Down @@

images/weave/trl.gif

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

weave/guides/integrations.mdx

-Original file line number
+Diff line change
@@ Expand Up @@
     - **[Koog](/weave/guides/integrations/koog)**
     ## RL Frameworks
     - **[Verifiers](/weave/guides/integrations/verifiers)**
+    - **[trl](/weave/guides/integrations/trl)**
     ## Protocols
@@ Expand Down @@

weave/guides/integrations/trl.mdx

-Original file line number
+Diff line change
@@ -0,0 +1,90 @@
+    ---
+    title: Transformer Reinforcement Learning (TRL)
+    description: "Train Large Language Models (LLMs) and Large Reasoning Models (LRMs) in TRL using the weave callback together with W&B to get a better glimpse of training progress. Log completions or traces and track changes in the quality of responses generated by the model."
+    ---
+    ![WeaveCallback](/images/weave/trl.gif)
+    [Transformer Reinforcement Learning (TRL)](https://huggingface.co/docs/trl/en/index)
+    TRL is a full stack library that provides a set of tools to train transformer language models with methods such as Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers. Along with using W&B to record your training metrics, you can integrate Weave with your TRL workflows to gain observability into how your model performs
+    during training. Weave records inputs, outputs, and timestamps for each evaluation step so you can inspect the quality of responses generated by the model.
+    This guide shows you how to use TRL with Weave and W&B.
+    ### Getting started
+    Install TRL with `uv` or `pip`. Choose one of the options below based on your setup.
+    ```bash
+    # Using uv
+    uv pip install trl
+    # Using pip
+    pip install trl
+    ```
+    Then install Weave and W&B using any of the options below based on your setup:
+    ```bash
+    # Using uv
+    uv pip install weave wandb
+    # Using pip
+    pip install weave wandb
+    ```
+    ### Training models with TRL and logging traces/completions using Weave
+    Once you have installed the necessary libraries, you can use the built-in [WeaveCallback](https://huggingface.co/docs/trl/main/en/callbacks#trl.WeaveCallback) in
+    TRL to log traces and completions at each evaluation step. The callback logs data during evaluation phases, so you need to pass an evaluation set to the
+    trainer object. The following example script demonstrates how to run an evaluation with TRL and log the results to Weave when training a model with
+    the GRPOTrainer.
+    Run the example and inspect the results in Weave:
+    ```python lines
+    import os
+    os.environ["WANDB_API_KEY"] = "<YOUR-WANDB-API-KEY>"
+    os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
+    import wandb
+    import weave
+    from datasets import load_dataset
+    from trl import GRPOConfig, GRPOTrainer, WeaveCallback
+    # Log in to W&B
+    wandb.login()
+    # Load the datasets
+    train_dataset, eval_dataset = load_dataset("trl-lib/ultrafeedback-prompt", split=['train[:5%]', 'test[:5%]'])
+    # Dummy reward function for demonstration purposes
+    def reward_num_unique_letters(completions, **kwargs):
+        """Reward function that rewards completions with more unique letters."""
+        completion_contents = [completion[0]["content"] for completion in completions]
+        return [float(len(set(content))) for content in completion_contents]
+    training_args = GRPOConfig(output_dir="Qwen2-0.5B-GRPO")
+    trainer = GRPOTrainer(
+        model="Qwen/Qwen2-0.5B-Instruct",
+        reward_funcs=reward_num_unique_letters,
+        args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+    )
+    weave_callback = WeaveCallback(trainer=trainer, ...)
+    trainer.add_callback(weave_callback)
+    trainer.train()
+    ```
+    ### Resources
+    Here are some resources you can use to learn how to integrate Weave with different workflows:
+. [WeaveCallback documentation](https://huggingface.co/docs/trl/main/en/callbacks#trl.WeaveCallback)
+. Curated [examples](https://github.com/wandb/rl_examples) that show how to use Weave with TRL for different algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRL integration doc #1890

Uh oh!

Diff view

Diff view

There are no files selected for viewing

TRL integration doc #1890

Are you sure you want to change the base?

Uh oh!

TRL integration doc #1890

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing