Skip to content

Release v0.7.8a0: Official HED docs in system prompt#139

Merged
neuromechanist merged 17 commits into
mainfrom
develop
Apr 2, 2026
Merged

Release v0.7.8a0: Official HED docs in system prompt#139
neuromechanist merged 17 commits into
mainfrom
develop

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

  • Replace hand-written HED annotation rules with official documentation from hed-standard GitHub repos
  • Add fetch script (scripts/fetch_hed_docs.py) that pulls and processes HedAnnotationSemantics.md and 02_Terminology.md
  • Add docs loader module (src/utils/hed_docs_loader.py) with caching, truncation, and graceful fallback
  • Add weekly CI workflow to auto-update bundled docs via PR
  • Delete legacy hed_rules.py (unused)
  • 498 unit tests passing, including 35 new tests (16 for MyST parser, 14 for loader, 5 for guide)
  • Zero mocks used (all tests use real filesystem via tmp_path)

Closes #137, closes #69, closes #100

Test plan

  • All 498 unit tests pass (0 failures, up from 479)
  • 16 direct unit tests for MyST directive stripping
  • Truncation, partial docs, empty fallback tested with real tmp_path dirs
  • Deterministic output verified for prompt caching
  • Fetch script idempotency verified
  • List-table multiline cell bug found and fixed by new tests
  • All PR review findings addressed (error handling, no mocks, docstrings)
  • Integration test with real annotation workflow

neuromechanist and others added 15 commits March 30, 2026 03:45
* Move semantic hints from system prompt to user prompt

System prompt is now static per schema version, enabling prompt caching
across requests. Semantic hints (which change per image/description)
are placed in the user prompt instead. The system prompt includes a
pointer instructing the LLM to check the user message for hints.

Fixes #129

* Address review findings for cache-friendly prompts

- Rename _format_semantic_hints to format_semantic_hints (public API,
  used cross-module)
- Align header: system prompt pointer and actual section both say
  "SEMANTIC HINTS"
- Soften system prompt wording to "may include" (hints are optional)
- Skip hints with empty tag keys
- Add debug logging when hints are included in user prompt
- Add 10 tests: user prompt with/without hints, confidence bucketing,
  system prompt caching invariant
- Evaluation: qwen/qwen3-235b-a22b-2507 -> qwen/qwen3.5-397b-a17b
  (most capable Qwen MoE, $0.39/M prompt)
- Vision: qwen/qwen3-vl-30b-a3b-instruct -> qwen/qwen3-vl-32b-instruct
  (newer VL model, $0.10/M prompt)
- Annotation: keep anthropic/claude-haiku-4.5 (unchanged)
- Replace all legacy gpt-oss-120b references in defaults and docs
- Provider: let OpenRouter auto-route for Qwen models
Pull HedAnnotationSemantics.md and 02_Terminology.md from
hed-standard GitHub repos into the annotation agent system
prompt. Add fetch script, docs loader, weekly CI update
workflow, and bundled docs in src/data/hed-docs/.

Removes 10 hand-written builder functions from
hed_comprehensive_guide.py, keeps HEDit-specific sections
(vocabulary check, correction workflow, error troubleshooting,
output format). Deletes legacy hed_rules.py.

Closes #137, closes #69, closes #100
- Fix empty docs caching permanently in long-running processes
  (only cache non-empty results, allow retry)
- Add logger.debug() to _get_docs_dir() importlib.resources fallback
- Add per-document error handling in fetch loop (partial failures
  no longer kill entire fetch)
- Add corrupt manifest recovery in load_manifest()
- Add OSError handling in main() for filesystem errors
- Replace all unittest.mock usage with tmp_path (no mocks policy)
- Add docs_dir parameter to load_hed_docs/get_comprehensive_hed_guide
  for testability without mocks
- Add 16 direct unit tests for clean_myst_markdown() parser
- Fix list-table multiline cell handling (continuation lines were
  creating new cells instead of appending to current cell)
- Fix stale docstrings (format_semantic_hints source field,
  _build_official_docs_section history-focused description)
- Re-fetch bundled docs with fixed parser
Replace hand-written HED rules with official docs (#137)
Closes #137, closes #69, closes #100
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Apr 2, 2026

Copy link
Copy Markdown

Deploying hedit with  Cloudflare Pages  Cloudflare Pages

Latest commit: 258acc0
Status: ✅  Deploy successful!
Preview URL: https://31b92d11.hedit.pages.dev
Branch Preview URL: https://develop.hedit.pages.dev

View logs

# Do NOT cache empty results; allow retry in long-running processes
return result

_cached_docs = result
@codecov

codecov Bot commented Apr 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.94118% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/utils/hed_docs_loader.py 70.00% 13 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

neuromechanist and others added 2 commits April 2, 2026 14:53
Corrects the accidental alpha bump; develop uses .dev suffix per
versioning rules. Alpha will be set when merging to main.
@neuromechanist neuromechanist merged commit f317913 into main Apr 2, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pull official HED docs into system prompt and add auto-update mechanism Preload core HED documentation into system prompt Revisit Agent Prompts

2 participants