A simple code for Nash Learning from Human Feedback
Note that this training code works for a small preference dataset from stanford, so try it out and run the training code if you feel interested.
Credit: @BojanFaletic, @Hong, Claude3 and GPT4
| Name | Name | Last commit date | ||
|---|---|---|---|---|
A simple code for Nash Learning from Human Feedback
Note that this training code works for a small preference dataset from stanford, so try it out and run the training code if you feel interested.
Credit: @BojanFaletic, @Hong, Claude3 and GPT4