Releases: moonrunnerkc/skillcheck
Release list
v1.3.0
Full Changelog: v1...v1.3.0
v1.2.3
v1.2.3
Added
--format github: outputs diagnostics as GitHub Actions workflow commands (::error,::warning,::notice) with proper escaping for file, line, and message properties. The GitHub Action now defaults to this format so PR annotations render automatically without a Python entrypoint..pre-commit-hooks.yaml: adds askillcheckhook for pre-commit, matchingSKILL.mdfiles and passing filenames to the CLI.CONTRIBUTING.md: documents the release convention (immutable patch tags plus a force-updatedv1moving major tag).tests/__init__.py: makes the test package importable, fixingfrom tests.conftestin environments where anothertestspackage shadows the path.nargs="+"on thepathargument: the CLI now accepts multiple paths (required by pre-commit'spass_filenamesmode). Single-path usage is unchanged.
Changed
action.ymlsimplified to a two-step composite action that installs skillcheck via pip and runs it directly. The Python entrypoint (action/entrypoint.py) is no longer invoked;--format githubhandles PR annotations natively. Theformatinput defaults togithub(wasjson, which was ignored at runtime).- README GitHub Action section updated to reflect automatic PR annotations via
--format github. - README pre-commit section added with a
.pre-commit-config.yamlsnippet. - README test count updated to 730.
Fixed
- Path separators in
--format githuboutput normalized to forward slashes for Windows compatibility.
Removed
- The Python entrypoint (
action/entrypoint.py) for annotation parsing and step summary generation is no longer used by the action. The action runs skillcheck directly.
v1.2.2
[1.2.2] - 2026-05-03
Added
compat.cursor-description-block-scalarrule (INFO by default). Flagsdescription: >,description: >+,description: |, anddescription: |+because Cursor's skills UI renders these as empty. The Cursor-safe form isdescription: >-(folded strip). Closes #1.--strict-cursorflag promotes the new rule to ERROR and fails the run. Mirrors--strict-vscode.cursoris now a valid--target-agentchoice; promotes the rule to WARNING when set without--strict-cursor.strict-cursoraction input (action.yml) andINPUT_STRICT_CURSORwiring (action/entrypoint.py).- TOML config:
strict-cursor = trueis now accepted inskillcheck.toml.
Changed
frontmatter.name.requiredandfrontmatter.description.requirednow append a hint when the missing field appears as a## name:or## description:markdown heading inside the frontmatter block. Frontmatter keys are YAML, not markdown; the hint nudges authors to drop the##prefix. Closes #1.
v1.2.1
[1.2.1] - 2026-05-03
Fixed
description.quality-scoreno longer flags verb-led descriptions starting withinvestigate,diagnose,triage,troubleshoot,examine,audit,inspect,compare,capture,normalize, orrefactor. Expanded_ACTION_VERBSfrom 43 to 170 entries to cover investigation, inspection, search, code-work, output, comparison, logging, and normalization clusters. Closes #2.
v1.2.0
[1.2.0] - 2026-04-29
Backward compatibility: previously-passing skills still pass. Some previously-failing skills now warn instead of error and produce exit code 0 instead of 1.
Added
template.detectedinfo-level rule andsrc/skillcheck/template_detection.pymodule.ECOSYSTEM_FIELDSclassification forlicense,repository,homepage, andtemplate.- Config support for
[frontmatter] extension_fieldsinskillcheck.toml.
Changed
frontmatter.name.reserved-worddemoted from ERROR to WARNING; source tag changed fromspectoadvisory; message rewritten.frontmatter.description.person-voicedemoted from ERROR to WARNING; messages rewritten to acknowledge the heuristic.- Budget-message phrasing aligned with the spec's "recommended" language across
sizing.*anddisclosure.*rules.
Fixed
frontmatter.field.unknownno longer fires onlicense,repository,homepage, ortemplate; these now produce info-levelfrontmatter.field.ecosystemdiagnostics or are silent for user extensions.- Templates (placeholder content,
template: trueflag, or files undertemplate/ortemplates/directories) no longer trigger deployment-blocking checks (frontmatter.name.directory-mismatch,compat.vscode-dirname,description.quality-score).
Internal
- Renamed
config.KNOWN_FRONTMATTER_FIELDStoconfig.SPEC_FIELDS. - New
template.detectedrule wired intorules/__init__.py. - Frontmatter rule implementation split into smaller modules while preserving
skillcheck.rules.frontmatterimports. - Root
SKILL.mdrestored soskillcheck SKILL.mdself-validation works from the repository root. - New fixture set under
tests/fixtures/covering ecosystem fields, user extensions, template detection, and demoted severities.
skillcheck 1.1.0
skillcheck 1.1.0
An external audit against v1.0.1 surfaced eight repo defects: an unpinned GitHub Action install, gitignored evidence paths cited in the README, a top-level SKILL.md describing an unrelated skill, a missing @v0 tag the README claimed existed, exit-code 2 conflating tool-misuse with warning-only reports, an oversized cli.py, and a vague-word list that flagged context-dependent terms like "comprehensive". v1.1.0 fixes all of them and reverses one v1.0.1 behavior change that turned out wrong.
Behavior change
Warning-only runs now return exit code 0 by default. v1.0.1 made them return 2; that conflated valid runs that produced warnings with tool-misuse cases (missing path, conflicting flags, empty directory). CI consumers couldn't tell the difference. v1.1.0 splits them: warnings exit 0, input errors exit 2, errors stay at 1, semantic drift stays at 3. The new --warnings-as-errors flag escalates warning-only runs to exit 1 for pipelines that want warnings to block.
If your CI relied on v1.0.1's "warnings exit 2" behavior, add --warnings-as-errors to your skillcheck invocation, or pin to @v1.0.1 until you can update.
Added
--warnings-as-errorsflag.- Two regression tests guarding the description-scorer rubric.
Changed
action.ymlinstall step pinsskillcheck>=1.0.1. Until v1.1.0 is uploaded to PyPI, this fails loudly on unpublished v1 features rather than silently resolving to v0.2.0.- Description scorer no longer penalizes
comprehensive,robust, orflexiblein descriptions. Each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified againstanthropics/skills: zero score changes across 17 files, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the two new regression tests are forward-looking guards, not regression evidence. - Description scorer verb matching collapsed from 86 entries (base + 3rd-person duplicates) to 42 base forms with stem normalization. Adding a new verb now only requires the base form.
- README field-test citations replaced gitignored
runs/...paths with reproducible commands. - README exit-code table documents the new semantics; flag table documents
--warnings-as-errors. - README test count: 663 → 667.
Removed
- Top-level
git-commit-crafterSKILL.md from the repo root. - False
@v0tag claim from the README and CHANGELOG.
Why this is a minor and not a patch
The exit-code semantics change is observable in CI and not opt-in. Adding --warnings-as-errors is also a public-surface addition. Either alone would be a minor bump under semver; together they aren't a patch.
Audit items not closed
- PyPI publish: the v1.1.0 sdist and wheel are built and pass
twine check, but PyPI upload requires authenticated credentials and happens out-of-band. Until that runs,pip install skillcheckcontinues to ship v0.2.0. The pinned action install will refuse to run. cli.pyline count: the audit asked for a refactor towardmain()under 100 lines andcli.pyunder 700. An attempted helper extraction met themain()target but pushed total file size from 1127 to 1172. The refactor was reverted; the file remains at its pre-audit size, with the audit's "deliberate choice" path left open for a follow-up.
skillcheck 1.0.1
skillcheck 1.0.1
skillcheck v1.0.1 commits a batch of post-v1.0.0 implementation work that had been sitting uncommitted, ships the docs corrections an end-to-end verification surfaced, and aligns the README, CHANGELOG, and CLI surface so they describe the same release.
There is one behavior change relative to v1.0.0: warning-only runs now return exit code 2. Errors return 1; semantic drift returns 3. CI consumers that previously relied on warning-only exiting 0 must update.
Changed
- Warning-only CLI reports now return exit code 2. Exit code 1 remains errors; exit code 3 remains semantic drift.
- README Exit Codes table row 0 now reads "no errors and no warnings".
- README test count corrected from 653 to 663.
- README JSON-stability promise updated from "0.x series" to "v1.x series".
- README field-test numbers reframed as April 2026 snapshots against
anthropics/skills, with a note that they will drift as upstream evolves. action.ymlformatinput description clarified: accepted but ignored at runtime; the action always invokes skillcheck with--format jsonso it can parse diagnostics for PR annotations and the step summary.- Development extras now include
ruff>=0.6,mypy>=1.10, andtypes-PyYAML>=6.0.
Added
--semantic: guide-compatible shortcut that enables semantic-adjacent validation. In standalone mode it runs heuristic graph analysis; with ingested agent responses it merges those diagnostics.--agent-reason: guide-compatible agent-workflow shortcut. Emits a combined critique and graph prompt packet so the calling agent can run both reasoning steps and feed JSON back through--ingest-critiqueand--ingest-graph.--format mdand--format agent: Markdown report output and agent-oriented next-action output.skillcheck.tomlconfig loading: top-level defaults for format, thresholds, target agent, strict VS Code mode, skip flags, ignored rule prefixes, graph analysis, semantic mode, history, and agent variants. CLI flags always win; the loader fills unset values.- Experimental
--activation-hypotheses: generates likely natural-language routing triggers plus a discoverability entropy score. Routing caveat included in every report. - Machine-readable diagnostic metadata: JSON diagnostics now include
sourceandconfidencefields. - GitHub Action inputs for the v1.0 modes:
semantic,analyze-graph,ingest-critique,critique-agent,ingest-graph,graph-agent,history,activation-hypotheses. The action still always emits JSON internally for PR annotations.
Why this is a patch and not a minor
Every addition above either documents existing behavior, refines a flag, or is gated behind a new opt-in flag. There is one breaking-ish change: warning-only runs now exit 2 instead of 0. Strict semver would call that a minor bump. The judgment call here: v1.0.0 shipped with documentation that already implied the v2-style exit codes (and the v1.0.1 README makes it explicit), the prior "warnings exit 0" behavior was undocumented in the released README, and the change matches what users running this in CI would expect. If your CI pipeline depended on the old behavior, pin to @v1.0.0 rather than @v1 until you can update.
Verification
After installing skillcheck==1.0.1:
skillcheck --version
# skillcheck 1.0.1
skillcheck skills/skillcheck/SKILL.md --analyze-graph
# exit 0 with no errors and no warnings (only INFO diagnostics)End-to-end verification was run against anthropics/skills at commit 5128e186 (18 SKILL.md files). All 26 documented flags exercised; all four exit codes (0, 1, 2, 3) reproduced; the action entrypoint produced byte-identical JSON to the CLI. Full report: see the v1.0.1 verification artifacts.
Links
- PyPI: https://pypi.org/project/skillcheck/1.0.1/ (available after publish)
- GitHub Release: https://github.com/moonrunnerkc/skillcheck/releases/tag/v1.0.1
- agentskills.io specification: https://agentskills.io/specification
- README: https://github.com/moonrunnerkc/skillcheck/blob/main/README.md
skillcheck 1.0.0
skillcheck v1.0.0 is the first major release. It adds agent-native semantic self-critique, heuristic capability graph extraction with five structural analyzers, and a per-skill validation history ledger on top of the v0.2.0 symbolic foundation. The tool is designed for two modes: when a calling agent is present it uses that agent for semantic analysis; when no agent is present it runs symbolic checks only. No LLM API keys required. Suitable for CI pipelines, local pre-commit hooks, or agent-loop integration.
Changed
- Rewrote README end-to-end for v1.0 launch audience. New sections: "Why This Exists", "Modes" (five subsections: Symbolic, Heuristic Graph, Agent Critique, Agent Graph, History), "Maintainer Notes". Removed v0.2.0-era feature bullet list and duplicated section prose. Restructured Quick Start to lead with the agent-native workflow. Rebuilt Options table from live
argparseaudit; every flag matches its actual help text and default. Rebuilt Rules table from live rule module audit; added source-tag legend paragraph. Added inline v1.0 case study paragraph (full detail atdocs/case-study-v1-real-world-runs.md). All cited diagnostics and output excerpts trace verbatim to field-test artifacts inruns/. - Added
docs/case-study-v1-real-world-runs.md: full breakdown of the pre-3B field test covering 18 Anthropic skills (symbolic),mcp-builderthrough the full v1.0 pipeline (symbolic + heuristic graph + agent critique + agent graph), and 5 uxuiprinciples skills (strict VS Code mode). Documents threesemantic.contradiction.detectederrors on a skill that passed all symbolic checks, fivegraph.capability.orphanedpatterns, and the recurring unknown-field pattern (license,homepage,env) across official catalogs.
Added
skills/skillcheck/SKILL.md: skillcheck's own SKILL.md, validating the tool against itself. Passes symbolic, graph, critique, and history validation with zero errors and zero warnings. Serves as the worked example for the Rules table in the README.- Self-host integration test suite (
tests/test_self_host.py): confirms the bundled SKILL.md passes symbolic validation, all five graph analyzers, critique ingestion, agent graph ingestion with divergence analysis, full CLI pipeline, history round-trip, and description scoring threshold. scripts/regen_self_host_fixtures.py: regeneratestests/fixtures/self_host/graph_clean.jsonfrom the live heuristic graph after skill edits.Makefilewithregen-self-host-fixturestarget: runs the regen script againstskills/skillcheck/SKILL.md.--historyflag: appends a validation record to the per-skill.skillcheck-history.jsonledger next to the SKILL.md file. Off by default; existing invocations see no behavior change. Incompatible with emit modes.--show-historyflag: reads the per-skill ledger and prints it (text or JSON via--format), then exits 0. Skips all validation. Incompatible with emit modes and--history.history.skill.regressedWARNING rule: fires when--historyis active, the skill content hash matches a prior passing run, and the current run fails. Indicates a rule tightened or an agent surfaced a new finding.history.write.failedWARNING rule: fires when--historyis active but the ledger file cannot be written. Validation exit code is unaffected.history.read.failedWARNING rule: fires when--historyis active but the existing ledger cannot be read. Validation continues without regression check.--emit-graph: emit mode. Prints the extracted capability graph (text or JSON) to stdout and exits 0. IdentifiesCapability,Input, andOutputnodes plusrequires/producesedges heuristically from heading structure and backtick references. Mutually exclusive with--analyze-graph,--emit-critique-prompt, and--ingest-critique.--analyze-graph: augment mode. Extracts the capability graph from each file, runs all five graph analyzers, and merges diagnostics into the validation report. Compatible with--ingest-critique(both run; results merged per file). Graph WARNINGs do not fail validation or change the exit code.- Five graph rule checkers (all WARNING severity):
graph.capability.orphaned,graph.input.unused,graph.output.unproduced,graph.capability.empty_description,graph.tool.unreferenced. No double-firing: body inputs and frontmatter tools are handled by separate analyzers. graph_rendermodule:render_graph_textandrender_graph_jsonpure rendering functions. JSON output is deterministic (field order follows dataclass declaration).merge_diagnosticspublic function incore.semanticandcore.__init__.merge_critique_diagnosticsis now a thin wrapper; existing callers unchanged.--critique-agent {claude,codex,cursor}: select the prompt template variant for agent self-critique. Prompt framing is tuned per vendor; the schema, parser, and exit codes are identical across all agents. Requires--emit-critique-promptor--ingest-critique. Records the agent name ascritique_sourcein JSON output and as a header line in text output. Default:claude.--emit-critique-prompt: print the agent self-critique prompt to stdout and exit 0. Use--format jsonto wrap in{"prompt": "..."}. In directory mode, prompts are separated by a delimiter line so downstream tools can split per-skill.--ingest-critique PATH: read an agent self-critique JSON response from PATH (use-for stdin), convert to diagnostics, merge with symbolic results, and emit a unified report.- Exit code 3: symbolic validation passed but the ingested critique contains semantic errors (contradictions or findings with ERROR severity). Exit code 1 takes priority when symbolic errors exist.
--emit-graph-prompt: print the capability graph extraction prompt to stdout and exit 0. Use--graph-agentto select the vendor variant. In directory mode, prompts are separated by the same per-skill delimiter used by--emit-critique-prompt.--ingest-graph PATH: read an agent graph extraction JSON response from PATH (use-for stdin), parse it into aCapabilityGraphwithsource="agent", run standard graph analyzers, run divergence analyzers against the heuristic baseline, and merge all diagnostics into the validation report.--graph-agent {claude,codex,cursor}: select the prompt template variant for graph extraction. Framing is tuned per vendor; the schema, parser, and exit codes are identical across all agents. Requires--emit-graph-promptor--ingest-graph. Default:claude. Records the agent name asgraph_sourcein JSON output and as a header line in text output.graph.contradiction.heuristic_disagreement(ERROR, source:agent): fires when an ingested agent graph claims an edge between two nodes that both appear in the heuristic graph but that edge is absent heuristically. Indicates a possible over-claimed capability. Only active when--ingest-graphis used.- Graph extraction prompt module (
agents.graph_base,agents.graph_claude,agents.graph_codex,agents.graph_cursor): parallel to the critique prompt module. Claude variant uses XML tags and a full worked example; Codex uses markdown headers and a full worked example; Cursor uses a compact type signature only.
Verification
After installing skillcheck==1.0.0:
skillcheck --version
# skillcheck 1.0.0
skillcheck skills/skillcheck/SKILL.md --analyze-graph
# Should exit 0 with no errors and no warnings (only INFO diagnostics)Links
- PyPI: https://pypi.org/project/skillcheck/1.0.0/ (available after publish)
- GitHub Release: https://github.com/moonrunnerkc/skillcheck/releases/tag/v1.0.0 (available after maintainer creates the release)
- agentskills.io specification: https://agentskills.io/specification
- README: https://github.com/moonrunnerkc/skillcheck/blob/main/README.md
v0.2.0
What's new in 0.2.0
First feature release. Adds cross-agent compatibility checks, file reference validation, progressive disclosure budgeting, description quality scoring, and a drop-in GitHub Action.
Highlights
GitHub Action -- three lines of YAML to add skillcheck to any CI pipeline. PR annotations, job summary table, and JSON output included. See the README for setup.
Cross-agent compatibility warnings -- flags Claude Code-only fields, VS Code directory-name requirements, and fields with unverified behavior in Codex and Cursor. Full compatibility matrix across four agents.
File reference validation -- parses markdown links and frontmatter directives, verifies referenced files exist on disk, catches symlink escapes (CWE-59), and warns when references go deeper than one directory level.
Progressive disclosure budget -- three-tier token budgeting (metadata at ~100 tokens, body at <5,000, resources on demand). Flags oversized code blocks, large tables, and embedded base64.
Description quality scoring -- scores 0-100 for agent discoverability. Checks action verbs, trigger phrases, keyword density, specificity, and length. Enforce a minimum with --min-desc-score N.
YAML type coercion detection -- catches when yaml.safe_load silently converts bare values like true, 123, or null into non-string types. Clear fix advice included.
New CLI flags
--strict-vscodepromotes VS Code compatibility issues from INFO to ERROR--target-agent {claude,vscode,all}scopes checks to a specific agent--skip-dirname-checkand--skip-ref-checkfor CI without filesystem context-q/--quietsuppresses all output (exit code only)--min-desc-score Nenforces a minimum description quality threshold
Bug fixes
- Fixed duplicate diagnostics for
../../reference paths (both depth-exceeded and traverses-above fired; now only the most specific one does) - Corrected sizing rule descriptions in the README
Install
pip install skillcheck==0.2.0
Full changelog: CHANGELOG.md