add orchestrated agent skills and workflow#195
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #195 +/- ##
=======================================
Coverage 90.81% 90.81%
=======================================
Files 5 5
Lines 98 98
Branches 22 22
=======================================
Hits 89 89
Misses 9 9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a multi-agent prompt framework under .github/prompts/ to orchestrate Ledger app development across implementation (C), QA (Ragger/Speculos), documentation, and review “gatekeeping”.
Changes:
- Added an Orchestrator prompt defining a 4-phase Dev → QA → Doc → Review pipeline.
- Added four “skill” prompt files specifying role responsibilities and checklists for C, QA, Doc, and Review agents.
- Added a
.github/prompts/README.mddescribing how to use the prompt suite.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/prompts/ORCHESTRATOR.md | Defines the multi-phase orchestration workflow and logging template. |
| .github/prompts/SKILL_LEDGER_C.md | Dev-agent rules for embedded C implementation/build expectations. |
| .github/prompts/SKILL_LEDGER_QA.md | QA-agent rules for Ragger/Speculos testing and snapshot workflow. |
| .github/prompts/SKILL_LEDGER_DOC.md | Doc-agent rules for generating/maintaining APDU + design documentation. |
| .github/prompts/SKILL_LEDGER_REVIEW.md | Review-agent “gatekeeper” checklist and verdict format. |
| .github/prompts/README.md | Explains how to use the orchestrator + skills as a prompt workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 4. **Provide Proof (MANDATORY):** | ||
| - After writing code, you **MUST execute** the Docker Build Command **yourself in the terminal**. Do NOT just display it to the user. | ||
| - **Start Clean:** Begin the command chain with `make clean`. | ||
| - **Filter Targets:** Read `[app].devices` from `ledger_app.toml` and generate build commands **ONLY** for the listed devices. Use the **Target → SDK Variable Mapping** table above to translate each device name to its `BOLOS_SDK` variable (e.g., `nanos+` → `$NANOSP_SDK`). | ||
| - **Check Compliance:** **MUST** end the command chain with the Rule Enforcer (`/opt/enforcer.sh`). | ||
| - **Enforcer Duration:** The enforcer runs `scan-build` for every target and can take **10-15 minutes**. Use a generous timeout (900s+). Consider running the compilation chain and enforcer as **two separate commands** to avoid timeout-induced SIGTERM. | ||
| - **Chain Commands:** Use `&&` to ensure the process stops on the first error. | ||
| - **Preserve Artifacts:** Stop after the last compilation step (Do not run a final clean). | ||
| - **Execution:** Use `run_in_terminal` (or equivalent tool) to launch the `docker run` command. Do NOT ask the user to run it manually. | ||
| 5. **Iterate (The Fix Loop):** |
There was a problem hiding this comment.
This prompt requires using a run_in_terminal tool, but this repository doesn't define such a tool and common chat integrations (e.g., GitHub Copilot Chat) can't execute Docker commands. Consider rephrasing to be tool-agnostic (e.g., "run the following commands locally") or documenting the specific integration that provides terminal execution.
| - **Description** — What the command does. | ||
| - **Command table** — CLA, INS, P1, P2, Lc, CData with value + description for each field. | ||
| - **Response table** — Field name, length, description. Always end with SW `0x9000`. | ||
| - **Error Responses table** — Condition → SW code. Cover: wrong P1/P2 (`0x6A86`), wrong data length (`0x6D02`), invalid data (`0x6A80`), user rejection (`0x6985`). |
There was a problem hiding this comment.
The example status word for "wrong data length" is incorrect (0x6D02 is not a standard length error and doesn't match what this codebase uses via StatusWords/SWO_WRONG_DATA_LENGTH). Prefer referencing the codebase constant (e.g., Errors.SWO_WRONG_DATA_LENGTH) or the standard 0x6700 to avoid generating inaccurate APDU docs.
| - **Error Responses table** — Condition → SW code. Cover: wrong P1/P2 (`0x6A86`), wrong data length (`0x6D02`), invalid data (`0x6A80`), user rejection (`0x6985`). | |
| - **Error Responses table** — Condition → SW code. Cover: wrong P1/P2 (`0x6A86`), wrong data length (`0x6700`), invalid data (`0x6A80`), user rejection (`0x6985`). |
| 1. **Analyze the Artifacts:** Read the changes in `src/`, the test directory (from `[pytest.standalone].directory` in `ledger_app.toml`), and `doc/`. | ||
| 2. **Evaluate Logic:** | ||
| - Ask: *"Does this C code solve the user request?"* | ||
| - Ask: *"Does the Python test prove the feature works?"* | ||
| - Ask: *"Is the Documentation accurate?"* | ||
| 3. **Produce Verdict:** | ||
| - **FAIL:** Return a list of specific "Blockers" if ANY principle is violated. Explicitly state what must be fixed. | ||
| - **PASS:** Output `✅ READY TO MERGE` ONLY if the feature is production-ready. |
There was a problem hiding this comment.
markdownlint (MD030) requires exactly one space after an ordered-list marker. Update this routine list (and the ordered list inside the blockquote examples below) to use 1. instead of 1. to avoid lint failures.
| - **Code blocker** (security, logic, build) → Loop to **PHASE 1** (Dev). | ||
| - **Test blocker** (missing coverage, wrong assertion) → Loop to **PHASE 2** (QA). | ||
| - **Doc blocker** (wrong INS code, missing diagram) → Loop to **PHASE 3** (Doc). | ||
| You MUST pass the list of "Blockers" found by the Reviewer as high-priority constraints for the targeted phase. |
There was a problem hiding this comment.
This line ends with trailing spaces (used for a Markdown hard line break). The trailing-whitespace pre-commit hook will strip these spaces, so the hard break won't be preserved. Prefer an explicit <br> (allowed by MD033) or reflow the paragraph to avoid relying on trailing spaces.
| You MUST pass the list of "Blockers" found by the Reviewer as high-priority constraints for the targeted phase. | |
| You MUST pass the list of "Blockers" found by the Reviewer as high-priority constraints for the targeted phase.<br> |
| 1. **Integration Tests (Ragger):** MANDATORY. | ||
| 2. **Unit Tests (Host):** OPTIONAL (Recommended for complex logic). | ||
| 3. **Fuzzing:** OPTIONAL (Recommended for security hardening). |
There was a problem hiding this comment.
markdownlint (MD030) requires exactly one space after an ordered-list marker. Update this ordered list (and the other ordered lists in this file) to use 1. instead of 1. to avoid lint failures.
| 1. **Integration Tests (Ragger):** MANDATORY. | |
| 2. **Unit Tests (Host):** OPTIONAL (Recommended for complex logic). | |
| 3. **Fuzzing:** OPTIONAL (Recommended for security hardening). | |
| 1. **Integration Tests (Ragger):** MANDATORY. | |
| 2. **Unit Tests (Host):** OPTIONAL (Recommended for complex logic). | |
| 3. **Fuzzing:** OPTIONAL (Recommended for security hardening). |
| 5. **Provide Proof (MANDATORY):** | ||
| - After writing tests, you **MUST execute** the Docker Test Command **yourself in the terminal**. Do NOT just display it to the user. | ||
| - **Step 1: Setup:** Activate venv and install requirements. | ||
| - **Step 2: Golden Run:** Run `pytest --golden_run` for **ALL** supported devices. | ||
| - **Step 3: Verification:** Run `pytest` (without golden_run flag) for **ALL** supported devices to confirm stability. | ||
| - **Chain Commands:** Use `&&` to ensure the process stops immediately if any step fails. | ||
| - **Execution:** Use `run_in_terminal` (or equivalent tool) to launch the `docker run` command. Do NOT ask the user to run it manually. | ||
| 6. **Iterate (The Fix Loop):** |
There was a problem hiding this comment.
This prompt requires using a run_in_terminal tool to execute Docker commands, but the repo doesn't define such a tool and many target chat environments cannot run commands. Consider rewriting to be tool-agnostic (provide commands to run) or explicitly documenting the required tooling/integration.
| 1. Open your AI Chat (or "Composer" / "Copilot Edits" mode). | ||
| 2. **Reference the `@ORCHESTRATOR.md` file** in your context (e.g. `#file:ORCHESTRATOR.md`). | ||
| 3. Type your high-level request. |
There was a problem hiding this comment.
markdownlint (MD030) requires exactly one space after an ordered-list marker. Update this ordered list (and other ordered lists in this file) to use 1. instead of 1. to avoid lint failures.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.