Skip to content

Prepare skilpel 0.3.0 release#6

Merged
pasunboneleve merged 9 commits into
mainfrom
vzfd-log-file-progress
May 25, 2026
Merged

Prepare skilpel 0.3.0 release#6
pasunboneleve merged 9 commits into
mainfrom
vzfd-log-file-progress

Conversation

@pasunboneleve

Copy link
Copy Markdown
Owner

Summary

  • prepare the 0.3.0 release notes and version
  • document text output, pretty progress, warnings, and TTY live status
  • include the output/reporting feature commits on the release branch

Validation

  • go test ./...
  • git diff --check
  • ./scripts/release-notes-from-changelog.sh 0.3.0 pasunboneleve/skilpel
  • go run ./cmd/skilpel version

Roborev

  • 1417 clean for release prep
  • 1414 clean for transient progress output
  • 1410 clean for TTY live status
  • 1408 clean for warnings/progress bars

Context: GitHub Actions shows both stdout and stderr, so JSON progress logs written to stderr clutter the visible validation step even though they are useful for later diagnostics.

Decision: Add --log-file and logFile config support that writes structured JSON progress records to a file while preserving the visible --log-format stream on stderr.

Alternatives considered: Redirecting stderr in CI would hide all progress, and making --log-format choose either pretty or JSON would force users to give up one output. A second sink keeps the visible report and diagnostic stream separate.

Tradeoffs: The log file is local to the run unless the caller stores or uploads it; CI consumers must decide whether to retain it as an artifact.

Architectural impact: Progress logging now has a small multi-handler boundary, keeping run execution unchanged while letting the CLI own output routing.
Context: Users may run skilpel next to skill-validator, and the old default stdout JSON plus custom flag parsing made the tools feel unrelated.

Decision: Move command parsing to Cobra/PFlag, add skill-validator-style --output/-o and --emit-annotations flags, make text the default stdout report, and require --output=json for the prior machine-readable summary contract.

Alternatives considered: Keeping stdout JSON by default would preserve compatibility but keep GitHub logs noisy and unlike skill-validator. Reusing only the visual colors would not align the actual CLI API.

Tradeoffs: This intentionally breaks the previous default stdout contract. Scripts must request --output=json, and consumers using the latest binary need to update before adopting this release.

Architectural impact: Human report rendering now lives behind a report boundary, while run execution still emits summaries and progress events. CLI parsing owns presentation flags and leaves provider/eval behavior unchanged.
Context: The skill-validator-style report was readable, but the with-skill, baseline, and delta rates were buried inline. The baseline label also hid that the comparison means running without the skill.

Decision: Render rates as indented entries, bold the percentages, and rename the visible baseline label to without while preserving structured JSON field names and evaluator behavior.

Alternatives considered: Renaming JSON fields would align terminology everywhere, but it would create unnecessary API churn. Keeping inline rates preserved compactness, but made the key comparison harder to scan in CI logs.

Tradeoffs: The text report uses more vertical space for each eval. That is acceptable for local and CI readability, but it is less compact for dense runs.

Architectural impact: This stays inside the report and progress formatting boundary. Gates, evaluator state, structured JSON output, and provider behavior are unchanged.
Context: Pretty progress writes live eval information to stderr while text output writes the final report to stdout. In a terminal or GitHub log, those streams are shown together, so the same eval rates appeared twice.

Decision: Treat pretty progress as the owner of live run and eval rows, and make the final text summary print only gates and result when visible progress was emitted.

Alternatives considered: Removing the final text report would hide gate details. Keeping pretty run_completed output would still duplicate the result line. Collapsing everything into the summary would lose live feedback during long runs.

Tradeoffs: Pretty progress with non-text final output no longer prints a human result line; the machine-readable final output still contains the result. Text output keeps a single human final result.

Architectural impact: The CLI now passes progress visibility into the reporting boundary. Progress logging, structured log files, and full text summaries remain separate paths.
Context: The final text result looked too similar to individual eval rows, and skill-validator uses a strong divider before its aggregate result block.

Decision: Add a heavy divider before skilpel's final gates/result block and prefix the final Result line with a pass or fail icon.

Alternatives considered: Keeping the final line plain would match skill-validator's exact text, but it left the final status visually weak beside per-eval rows. Adding another section heading would be noisier than the divider pattern already familiar from skill-validator.

Tradeoffs: The text output now differs slightly from skill-validator's exact Result wording, while keeping the same final-result structure and improving scanability.

Architectural impact: The change stays within the text reporting boundary; structured JSON, markdown output, evaluation logic, and gates are unchanged.
Context: Pretty output made pass/fail state clearer, but skipped skills had no warning surface and long runs still only advanced when each eval completed.

Decision: Add run warnings to the summary model, log skipped skills without eval files as yellow warnings, render those warnings in text and pretty output, and add a per-eval progress bar to pretty eval rows.

Alternatives considered: An animated spinner would show liveness between eval completions, but it can pollute captured CI logs and needs terminal-specific lifecycle handling. A printed progress bar is deterministic, useful in GitHub logs, and still gives local progress feedback.

Tradeoffs: The progress bar updates only when an eval completes, so it is not a heartbeat during a single slow provider call. That avoids noisy output while leaving room for a future TTY-only spinner.

Architectural impact: Warnings are now part of the run summary and structured progress stream. Evaluation behavior, gates, provider calls, markdown output, and JSON artifact compatibility otherwise remain unchanged.
Context: Local pretty runs had no visible activity while waiting for the first provider response. Completed eval rows helped CI, but they did not show local liveness during a slow call.

Decision: Add a TTY-only live status line with a spinner, empty-to-filled progress bar, and completed eval count. Clear and redraw that line around durable progress logs so completed rows remain readable above it.

Alternatives considered: Charmbracelet Bubbles provides progress and spinner components, but it expects a Bubble Tea event loop. schollz/progressbar is a good standalone progress package, but routing slog output through it would make the logging boundary less direct than a small local adapter.

Tradeoffs: The live spinner is intentionally unavailable in captured logs, so CI still relies on completed eval rows. The progress bar advances on eval completion while the spinner handles liveness between completions.

Architectural impact: The pretty progress handler now owns a closeable live-line lifecycle. Structured logs, log files, reports, eval execution, and gate behavior are unchanged.
Context: The live progress bar is useful while a local run is waiting, but completed eval rows are durable facts and should not carry transient UI state.

Decision: Keep the spinner/progress bar only in the TTY live line, add spacing before that live line, and remove progress bars and counters from durable eval rows.

Alternatives considered: A separate durable progress row would make replayed logs show progress, but it still mixes transient UI with eval results. Keeping the progress prefix in eval rows made the identity harder to scan.

Tradeoffs: Captured logs no longer show per-row progress bars, by design. They still show each completed eval row and final gates/results.

Architectural impact: This stays inside pretty progress formatting. Structured logs, reports, live TTY status, and evaluation behavior are unchanged.
Context: The CLI output work changed default stdout behavior, pretty progress, live TTY feedback, warnings, and release-facing documentation. The release files need to agree before tagging.

Decision: Release the current 0.x line as 0.3.0, move the unreleased notes into a dated changelog section, update VERSION, document CLI output in README/docs, and update run help for pretty progress semantics.

Alternatives considered: A 1.0.0 release would highlight the breaking default-output change, but the project is still explicitly in MVP/0.x shape. Leaving the release as 0.2.x would understate the new CLI capability and breaking behavior.

Tradeoffs: Downstream users tracking latest will receive the text-output default change and must use --output=json for scripts. The changelog calls that out explicitly.

Architectural impact: This changes release metadata and documentation only; evaluator behavior is already implemented in earlier commits.
@pasunboneleve pasunboneleve merged commit de029d7 into main May 25, 2026
3 of 4 checks passed
@pasunboneleve pasunboneleve deleted the vzfd-log-file-progress branch May 25, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant