Skip to content

Add skill quality evaluator based on best practices #119

@luongnv89

Description

@luongnv89

Type

feature (high confidence)

Description

Add a new CLI command (e.g., asm eval <skill-path>) that evaluates a skill against established best practices and produces a structured quality report with actionable feedback. This tool helps skill creators improve their skills before publishing or sharing.

The evaluator should check a skill's SKILL.md (and supporting files) against criteria derived from best practice sources (see #117), including:

  • Structure & completeness — has required frontmatter fields (name, description, type), proper markdown structure, modes/steps documented
  • Description quality — trigger description is specific and non-overlapping, concise but descriptive, uses action verbs
  • Prompt engineering — uses progressive disclosure, sets clear degrees of freedom, avoids ambiguity, includes examples
  • Context efficiency — avoids bloating the context window, uses references/templates instead of inline content, respects token budgets
  • Safety & guardrails — includes error handling instructions, has confirmation steps for destructive actions, validates prerequisites
  • Testability — acceptance criteria are testable, examples cover edge cases, outputs are verifiable
  • Naming & conventions — follows naming conventions, uses imperative mood, labels are consistent

Output

A scored report with per-category ratings and specific improvement suggestions:

◆ Skill Evaluation: my-skill
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
  Category              │ Score │ Notes
  ──────────────────────┼───────┼──────────────────────────
  Structure             │  9/10 │ ✓ All sections present
  Description quality   │  6/10 │ Trigger too broad
  Prompt engineering    │  7/10 │ Missing examples section
  Context efficiency    │  8/10 │ Good use of references
  Safety & guardrails   │  5/10 │ No confirmation steps
  Testability           │  7/10 │ Criteria could be sharper
  Naming & conventions  │  9/10 │ ✓ Follows conventions
  ──────────────────────┼───────┼──────────────────────────
  Overall               │ 73/100│

⚡ Top 3 improvements:
  1. Add confirmation prompts before destructive actions
  2. Narrow trigger description — overlaps with "code-review"
  3. Add 2-3 usage examples in the skill body

Auto-Fix Mode (--fix)

In addition to reporting issues, the evaluator should provide an --fix flag that automatically corrects basic, deterministic problems in a skill's SKILL.md frontmatter and structure. This saves skill creators from manually fixing trivial issues that have clear, unambiguous solutions.

Usage:

asm eval <skill-path> --fix        # evaluate and auto-fix basic issues
asm eval <skill-path> --fix --dry-run  # show what would be fixed without modifying files

Auto-fixable items:

Problem Auto-fix action
Missing version in frontmatter Add version: 0.1.0
Missing author / creator info Add author: field with value from git config (user.name) or prompt
Missing effort field Infer effort (XS/S/M/L/XL) from skill line count and complexity, add to frontmatter
Missing type field Infer type from skill content (e.g., presence of code patterns → code, CLI commands → cli)
Missing description field Extract first meaningful sentence from the skill body as description
Trailing whitespace / inconsistent line endings Normalize whitespace
Missing blank line between sections Insert blank lines per markdown best practices
Frontmatter field ordering Reorder to canonical order: name, description, version, author, type, effort

Auto-fix output example:

◆ Skill Evaluation: my-skill (--fix mode)
┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

  Auto-fixed 3 issues:
  ✓ Added missing version: 0.1.0
  ✓ Added missing author: "John Doe" (from git config)
  ✓ Added missing effort: M (inferred from 180 lines)

  Remaining issues (manual fix needed):
  ⚠ Description quality: trigger too broad — overlaps with "code-review"
  ⚠ Safety: no confirmation steps for destructive actions

  Overall: 73/100 → 82/100 (after auto-fix)

Constraints:

  • Auto-fix only applies to deterministic, low-risk corrections (frontmatter fields, formatting)
  • Content-level issues (description quality, prompt engineering, safety) are never auto-fixed — they require human judgment
  • --dry-run shows a diff preview of proposed changes without writing to disk
  • All fixes are applied to SKILL.md in-place; a backup is created as SKILL.md.bak before modification

Related

Reporter Context

To support skill creators, add a tool to evaluate their skill based on the best practice.

This process should also provide option for auto-fix to fix some basic problem such as: missing version number, missing creator info, effort value, etc.

Acceptance Criteria

  • New asm eval <skill-path> command that accepts a local skill directory path (high confidence)
  • Evaluates SKILL.md against at least 6 best-practice categories with per-category scores (high confidence)
  • Produces a structured report with overall score, per-category breakdown, and top improvement suggestions (high confidence)
  • --fix flag auto-corrects basic frontmatter issues: missing version, author, effort, type, and description (high confidence)
  • --fix --dry-run shows proposed fixes as a diff without modifying files (high confidence)
  • Auto-fix creates a SKILL.md.bak backup before modifying the original file (high confidence)
  • Auto-fix only applies to deterministic, low-risk corrections — never modifies content-level quality issues (high confidence)
  • Supports --json output for machine consumption (medium confidence)
  • Can optionally be integrated into asm publish as a pre-publish quality gate (medium confidence)

Metadata

  • Priority: medium
  • Effort: L
  • Suggested labels: feature, cli

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions