Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.1.0] - 2026-04-28

External audit against v1.0.1 surfaced eight repo defects ranging from documentation drift to a CI-confusing exit-code conflation. v1.1.0 ships fixes for all eight, reverses one v1.0.1 behavior change that turned out wrong, and tightens the description scorer's vague-word rubric. The minor bump is driven by the exit-code semantics change (now distinguishes warning-only from input error) and the new `--warnings-as-errors` flag.

### Behavior change

- Warning-only CLI reports now return exit code 0 by default, reversing v1.0.1's "warnings exit 2" decision. Exit code 2 is now reserved for tool-misuse / input errors (missing path, conflicting flags, empty directory) so CI consumers can distinguish them. Pass `--warnings-as-errors` to escalate warning-only runs to exit code 1 for stricter gates. Errors remain 1; semantic drift remains 3.

### Added

- `--warnings-as-errors` flag: escalate warning-only runs to exit 1 for CI configurations that want warnings to block.
- `scripts/summarize_batch.py` and `tests/test_batch15_summarize.py`: maintainer-facing tool that consumes a directory of skillcheck batch-run artifacts (one directory per repo, one subdirectory per skill, paired `*.json` / `*.txt` reports per phase) and writes `summary.csv` plus `findings.md`. Invoked as `python scripts/summarize_batch.py <batch_dir>`. Not exposed as a console script, not wired into the GitHub Action; the action runs skillcheck against one path, this consumes outputs across many. Documented under Maintainer Notes in the README.
- `tests/test_readme_test_count_claim.py`: parses the README's "N tests cover ..." sentence and asserts it matches `pytest --collect-only`. The next time the suite grows without bumping the README number, CI fails. Closes the recurring drift pattern that v1.0.1 had to correct twice.

### Changed
- README test count bumped from 663 to 664 to include the new drift-guard test.

- `action.yml` install step pins `skillcheck>=1.0.1` so consumers fail loudly on unpublished v1 features instead of silently running v0.2.0.
- Description scorer rubric documented and tightened: dropped `comprehensive`, `robust`, and `flexible` from `_VAGUE_WORDS` because each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills` (17 SKILL.md files): zero score changes, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the new regression tests are forward-looking guards against scoring drift if the list is ever re-expanded.
- Description scorer verb matching: collapsed `_ACTION_VERBS` from 86 entries (base + 3rd-person duplicates) to 42 base forms. Added `_is_action_verb()` to handle stem normalization across `-s`, `-es`, and `-ies` endings. Adding a new verb now only requires the base form.
- README test count bumped from 663 to 667 to include the drift-guard test, two description-scorer regression tests, and the `--warnings-as-errors` test.
- README field-test citations: replaced seven gitignored `runs/...` path references with the exact `skillcheck` commands needed to reproduce each finding. Readers can now verify the claims without access to private artifacts.
- README exit-code table reflects the new semantics; flag table documents `--warnings-as-errors`.

### Removed

- Top-level `git-commit-crafter` SKILL.md from the repo root. It was unrelated to skillcheck and confused first-time readers; the canonical example lives at `skills/skillcheck/SKILL.md`.
- False `@v0` tag claim from the README. Only `@v0.2.0` was ever pushed; the action-install snippet no longer suggests a tag that does not exist. CHANGELOG entries that referenced `@v0` corrected to `@v0.2.0`.

## [1.0.1] - 2026-04-28

Expand Down Expand Up @@ -71,7 +92,7 @@ End-to-end verification against `anthropics/skills` surfaced documentation drift
## [0.2.0] - 2026-03-11

### Added
- **GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
- **GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0.2.0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
- **`__main__.py` entry point**: `python -m skillcheck` now works as an alternative to the console script.
- **File reference validation**: parses markdown body for `[text](path)`, `![alt](path)`, and `source:`/`file:`/`include:` directives; verifies referenced files exist on disk; warns when references exceed one directory level from SKILL.md.
- **Progressive disclosure budget**: three-tier token budgeting: metadata/frontmatter at ~100 tokens, body at <5,000 tokens, resources loaded on demand. Flags oversized code blocks (>50 lines), large tables (>20 rows), and embedded base64.
Expand Down
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
skillcheck SKILL.md --format json
```

From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
From the field test on Anthropic's official skills repository (18 skills, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection. Reproduce: clone `anthropics/skills` and run `skillcheck skills/ --format text`.

### Heuristic Graph

Expand All @@ -83,7 +83,7 @@ skillcheck SKILL.md --emit-graph --format json

Graph nodes: `Capability` (section headings), `Input` (backtick references required by capabilities), `Output` (backtick references produced by capabilities). Analyzers fire on orphaned capabilities with no declared I/O, unused inputs, unproduced outputs, capabilities with no description body, and `allowed-tools` entries not backtick-referenced in the body.

From the field test on `mcp-builder/SKILL.md` (`runs/anthropics-mcp-builder/02-graph-analyze.txt`):
From the field test on `mcp-builder/SKILL.md` (reproduce: `skillcheck skills/mcp-builder/SKILL.md --analyze-graph`):

```
line 18 ⚠ warning graph.capability.orphaned Capability 'Understand Modern MCP Design'
Expand All @@ -109,7 +109,7 @@ skillcheck SKILL.md --agent-reason --format agent # critique + graph pro

`--critique-agent` selects a framing variant tuned for each platform (claude, codex, cursor). The schema and exit codes are identical across all variants.

From the field test (`runs/anthropics-mcp-builder/04-critique-report.txt`): the symbolic run on `mcp-builder/SKILL.md` passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:
From the field test on `mcp-builder/SKILL.md`: the symbolic run passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:

```
✗ error semantic.contradiction.detected Contradiction between 'Frontmatter
Expand Down Expand Up @@ -149,7 +149,7 @@ skillcheck SKILL.md --show-history --format json

When `--history` is active and the current run fails on content that matched a prior passing run, skillcheck emits `history.skill.regressed` (WARNING). This surfaces rule tightening or new agent findings without requiring manual output comparison.

From the field test (`runs/anthropics-mcp-builder/08-history.txt`):
From the field test (reproduce: `skillcheck skills/mcp-builder/SKILL.md --history && skillcheck skills/mcp-builder/SKILL.md --show-history`):

```
History ledger: SKILL.md
Expand All @@ -172,7 +172,7 @@ Three lines to add skillcheck to any CI pipeline:
path: skills/
```

Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release. The `@v0` tag remains in place for existing CI configurations.
Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release.

Failures block the PR. Errors and warnings appear as inline diff annotations on the changed files. The workflow run page gets a Markdown summary table. For the complete list of action inputs and outputs, see [`action.yml`](action.yml).

Expand All @@ -188,7 +188,7 @@ The v1.0 graph and critique modes are available as action inputs. Example with s

## Output

Text output (default), excerpt from `runs/anthropics-corpus/01-symbolic-all.txt`:
Text output (default), excerpt from a run against the Anthropic skills corpus:

```
✗ FAIL skills/claude-api/SKILL.md
Expand Down Expand Up @@ -245,6 +245,7 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
| `--min-desc-score N` | | Minimum description quality score (0-100); below this triggers a warning |
| `--target-agent {claude,vscode,all}` | `all` | Scope compatibility checks to a specific agent |
| `--strict-vscode` | `false` | Promote VS Code compatibility issues to errors |
| `--warnings-as-errors` | `false` | Escalate warning-only runs to exit code 1 (default for warning-only is 0) |
| `--semantic` | `false` | Enable semantic-adjacent validation; standalone mode runs heuristic graph analysis |
| `--agent-reason` | `false` | Emit a combined critique + graph prompt packet for the calling agent |
| `--emit-critique-prompt` | `false` | Print agent self-critique prompt to stdout and exit 0 |
Expand All @@ -264,12 +265,12 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit

| Code | Meaning | Example invocation |
|---|---|---|
| `0` | No errors and no warnings | `skillcheck skills/skillcheck/SKILL.md` |
| `1` | One or more errors found | `skillcheck SKILL.md` when the name is invalid |
| `2` | Warning-only report or input error | `skillcheck SKILL.md --max-lines 1` |
| `0` | No errors (warning-only counts as a clean pass by default) | `skillcheck skills/skillcheck/SKILL.md` |
| `1` | One or more errors found, or warnings with `--warnings-as-errors` | `skillcheck SKILL.md` when the name is invalid |
| `2` | Input error: missing path, empty directory, conflicting flags, malformed argument | `skillcheck nonexistent.md` |
| `3` | Symbolic passed but ingested critique found semantic errors | `skillcheck SKILL.md --ingest-critique response.json` when the agent reported contradictions |

Exit code 1 takes priority over 3 when symbolic errors also exist.
Pass `--warnings-as-errors` to escalate warning-only runs to exit 1 for stricter CI gates. Exit code 1 takes priority over 3 when symbolic errors also exist; code 2 is reserved for tool-misuse cases so CI can distinguish them from skill-content findings.

## Rules

Expand Down Expand Up @@ -320,7 +321,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-

## Case Study

We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. To reproduce, clone each upstream repo and run `skillcheck <path>` (the case study below records the exact invocations).

The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.

Expand All @@ -347,7 +348,7 @@ pip install -e ".[dev]"
python3 -m pytest tests/ -q
```

664 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
667 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.

## Maintainer Notes

Expand Down
37 changes: 37 additions & 0 deletions RELEASE_NOTES_v1.1.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# skillcheck 1.1.0

An external audit against v1.0.1 surfaced eight repo defects: an unpinned GitHub Action install, gitignored evidence paths cited in the README, a top-level SKILL.md describing an unrelated skill, a missing `@v0` tag the README claimed existed, exit-code 2 conflating tool-misuse with warning-only reports, an oversized `cli.py`, and a vague-word list that flagged context-dependent terms like "comprehensive". v1.1.0 fixes all of them and reverses one v1.0.1 behavior change that turned out wrong.

## Behavior change

Warning-only runs now return exit code **0** by default. v1.0.1 made them return 2; that conflated valid runs that produced warnings with tool-misuse cases (missing path, conflicting flags, empty directory). CI consumers couldn't tell the difference. v1.1.0 splits them: warnings exit 0, input errors exit 2, errors stay at 1, semantic drift stays at 3. The new `--warnings-as-errors` flag escalates warning-only runs to exit 1 for pipelines that want warnings to block.

If your CI relied on v1.0.1's "warnings exit 2" behavior, add `--warnings-as-errors` to your skillcheck invocation, or pin to `@v1.0.1` until you can update.

## Added

- `--warnings-as-errors` flag.
- Two regression tests guarding the description-scorer rubric.

## Changed

- `action.yml` install step pins `skillcheck>=1.0.1`. Until v1.1.0 is uploaded to PyPI, this fails loudly on unpublished v1 features rather than silently resolving to v0.2.0.
- Description scorer no longer penalizes `comprehensive`, `robust`, or `flexible` in descriptions. Each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills`: zero score changes across 17 files, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the two new regression tests are forward-looking guards, not regression evidence.
- Description scorer verb matching collapsed from 86 entries (base + 3rd-person duplicates) to 42 base forms with stem normalization. Adding a new verb now only requires the base form.
- README field-test citations replaced gitignored `runs/...` paths with reproducible commands.
- README exit-code table documents the new semantics; flag table documents `--warnings-as-errors`.
- README test count: 663 → 667.

## Removed

- Top-level `git-commit-crafter` SKILL.md from the repo root.
- False `@v0` tag claim from the README and CHANGELOG.

## Why this is a minor and not a patch

The exit-code semantics change is observable in CI and not opt-in. Adding `--warnings-as-errors` is also a public-surface addition. Either alone would be a minor bump under semver; together they aren't a patch.

## Audit items not closed

- **PyPI publish**: the v1.1.0 sdist and wheel are built and pass `twine check`, but PyPI upload requires authenticated credentials and happens out-of-band. Until that runs, `pip install skillcheck` continues to ship v0.2.0. The pinned action install will refuse to run.
- **`cli.py` line count**: the audit asked for a refactor toward `main()` under 100 lines and `cli.py` under 700. An attempted helper extraction met the `main()` target but pushed total file size from 1127 to 1172. The refactor was reverted; the file remains at its pre-audit size, with the audit's "deliberate choice" path left open for a follow-up.
65 changes: 0 additions & 65 deletions SKILL.md

This file was deleted.

2 changes: 1 addition & 1 deletion action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ runs:
if [ -n "$INPUT_VERSION" ]; then
python -m pip install --quiet "skillcheck==$INPUT_VERSION"
else
python -m pip install --quiet skillcheck
python -m pip install --quiet "skillcheck>=1.0.1"
fi

- name: Run skillcheck
Expand Down
Loading
Loading