skilpel

skilpel is a Go CLI for evaluating Codex-style skills. It runs eval prompts with and without a skill, asks a judge model to score the outputs, and turns the result into local artifacts, terminal feedback, and CI-friendly gates.

skilpel CLI output showing with-skill and without-skill comparison

Skills are prompts or instructions. skilpel runs the same eval with the skill enabled and disabled, then compares the judged results. The point is to make the claim concrete: this skill improves model output by this much, against these assertions, under these gates.

Why this exists

Skill repositories need a repeatable way to answer a narrow question: did this skill change model output in the intended direction? skilpel keeps that check local, explicit, and suitable for CI.

Quick start

go test ./...
go run ./cmd/skilpel run --root ./skills --skill my-skill --eval-id basic --baseline

Model-backed runs use the provider selected by --provider or provider in a config file. Run skilpel run --help for the current provider list, default API key variables, endpoint override rules, and available flags.

OPENAI_API_KEY=... go run ./cmd/skilpel run \
  --root ./skills \
  --skill my-skill \
  --eval-id basic \
  --workspace ./.skilpel \
  --baseline \
  --provider openai \
  --target gpt-4o-mini \
  --judge gpt-4o-mini \
  --min-pass 0.90 \
  --min-delta 0.20

For scripts and downstream tooling, keep the final summary on stdout as JSON:

go run ./cmd/skilpel run --config skilpel.yaml --output=json

See CLI output for stdout, stderr, log-file, and exit-code behavior.

Current scope

skilpel run supports:

provider plugins for OpenAI, xAI, Qwen, Anthropic/Claude, and Gemini
per-skill eval files in YAML or JSON
skill and eval filtering with --skill and --eval-id
optional without_skill baseline comparison
pass-rate and baseline-delta gates
JSON artifacts in a workspace directory
text, JSON, and Markdown final summaries
structured or pretty progress logs on stderr

Architecture at a glance

cmd/skilpel owns the CLI entrypoint.
internal/skilpel owns skill discovery, prompt construction, provider calls, judging, gates, progress logs, reports, and artifacts.
Eval files live with the skill they test, usually as <skill>/evals/evals.yaml.
Run artifacts are written to the configured workspace, usually ./.skilpel.

Installation

For downstream CI, install a tagged version rather than tracking a moving branch:

go install github.com/pasunboneleve/skilpel/cmd/skilpel@$SKILPEL_VERSION

Tagged releases also publish prebuilt archives for Linux amd64 and macOS arm64.

Validation

go test ./...

Repository map

cmd/skilpel/: CLI binary.
internal/skilpel/: evaluator implementation and tests.
docs/: focused user documentation.
CHANGELOG.md: unreleased and release notes.

Documentation

go run ./cmd/skilpel --help
go run ./cmd/skilpel run --help
go run ./cmd/skilpel version
CLI output
Eval files
Changelog

For a complete repository model with skills and evals kept close together, see pasunboneleve/oiticica-style.

The name

_{A scalpel cuts away the excess; a stencil preserves the pattern.}

The name points at the work skilpel is meant to do: cut away vague confidence and preserve the repeatable pattern that makes a skill useful.

Prior art

skilpel is inspired by agent-skills-eval and agentskills.io-style skill layouts. It focuses on the subset needed for fast local iteration and CI: skill discovery, eval-case filtering, provider-backed model calls, baseline comparison, and explicit pass/fail thresholds.

License

skilpel is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
cmd/skilpel		cmd/skilpel
docs		docs
internal/skilpel		internal/skilpel
scripts		scripts
testdata/skills/shell-script		testdata/skills/shell-script
.envrc		.envrc
.gitignore		.gitignore
.kata.toml		.kata.toml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skilpel

Why this exists

Quick start

Current scope

Architecture at a glance

Installation

Validation

Repository map

Documentation

The name

Prior art

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

skilpel

Why this exists

Quick start

Current scope

Architecture at a glance

Installation

Validation

Repository map

Documentation

The name

Prior art

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages