Skip to content
Change the repository type filter

All

    Repositories list

    • terminal-bench-science

      Public
      Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal
      Python
      Apache License 2.0
      3563116Updated Apr 19, 2026Apr 19, 2026
    • harbor

      Public
      Harbor is a framework for running agent evaluations and creating and using RL environments.
      Python
      Apache License 2.0
      9211.5k82179Updated Apr 18, 2026Apr 18, 2026
    • skills

      Public
      Public agent skills catalog for Harbor
      Apache License 2.0
      1602Updated Apr 17, 2026Apr 17, 2026
    • t-bench-docs

      Public
      TypeScript
      13620Updated Apr 15, 2026Apr 15, 2026
    • benchmark-template

      Public template
      Harbor Benchmark Template
      Python
      7778Updated Apr 15, 2026Apr 15, 2026
    • 🚧 Accepting Task Submissions 🚧
      Python
      132126096Updated Apr 15, 2026Apr 15, 2026
    • harbor-cookbook

      Public
      Realistic examples of building evals and optimizing agents with Harbor
      Python
      Apache License 2.0
      46001Updated Apr 11, 2026Apr 11, 2026
    • awesome-harbor

      Public
      A curated list of awesome Harbor ecosystem projects
      23201Updated Apr 5, 2026Apr 5, 2026
    • terminal-bench-2

      Public
      Shell
      Apache License 2.0
      641851117Updated Apr 1, 2026Apr 1, 2026
    • harbor-docs

      Public
      MDX
      9206Updated Mar 31, 2026Mar 31, 2026
    • Shell
      3004Updated Mar 26, 2026Mar 26, 2026
    • A benchmark for LLMs on complicated tasks in the terminal
      Python
      Apache License 2.0
      5002k106186Updated Jan 22, 2026Jan 22, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.