feat: EEGLAB Community Assistant (Epic #97)#124
Merged
Conversation
* Make langfuse optional to fix Python 3.14 compatibility - Move langfuse from dependencies to optional-dependencies[observability] - Add lazy import with try/except in get_langfuse_handler() - Provides helpful warning if langfuse not installed - Fixes Pydantic v1 compatibility issues on Python 3.14 - Relates to #108 * feat: add EEGLAB community assistant configuration - Create config.yaml with complete EEGLAB assistant setup - 25+ documentation sources from sccn.github.io - 6 GitHub repos (eeglab, ICLabel, clean_rawdata, EEG-BIDS, LSL) - 3 core paper DOIs and 6 citation queries - Custom system prompt with EEG workflow guidance - Auto-generated knowledge tools (docs, discussions, papers) - Add comprehensive unit tests (26 tests, all passing) Follows HED assistant pattern for Phase 1 implementation. * docs: add Phase 1 implementation summary * fix: address PR review findings - Fix DOI typo in summary (ffinf -> fninf) - Fix line counts (340 lines, not 504) - Remove duplicate doc entry (Installation/quickstart same URL) - Update doc counts (26 total: 2 preloaded + 24 on-demand) All review issues addressed, tests passing. * docs: add local testing and epic branch workflow guides - Add comprehensive local testing guide for backend/CLI - Add epic branch workflow for multi-phase features - Add quick test script for verification * docs: update epic workflow to use worktrees - Epic branch should be a worktree, not in main repo - Update all instructions to reflect worktree structure - Add worktree management section * Simplify EEGLAB tests with shared fixtures - Consolidate fixtures at module level (setup_registry, eeglab_config, eeglab_assistant) - Remove duplicate fixture definitions across test classes - Reduce test file from 449 to 317 lines (29% reduction) - Improve test readability and maintainability - All 26 tests still pass * fix: address all PR review findings **Critical Fixes:** - Fix 23/26 broken documentation URLs to match actual SCCN GitHub structure - Move test_eeglab_interactive.py to scripts/ directory - Fix documentation count inconsistencies (25 → 26 sources) **Documentation Improvements:** - Update config.yaml line count (340 → 334) - Clarify preload limit is a guideline (2-3 recommended, not required) - Mark knowledge base numbers as projections (~150+ issues, ~470+ PRs expected) - Update doc titles to match actual files (e.g., channel spectra, ERP images) **Test Enhancements (26 → 33 tests):** - Add database error handling test (graceful degradation) - Add SSRF validation test (security - rejects localhost/private IPs) - Add GitHub repo format validation test (rejects invalid formats) - Add system prompt substitution test (no unfilled placeholders) - Add tool input validation tests (empty queries, long queries) - Add preload handling test (creation without preload) **All tests passing:** 33/33 ✅ Addresses review findings from code-reviewer, pr-test-analyzer, and comment-analyzer agents.
* feat: add generic YAML-based testing framework Create unified test suite that automatically validates ALL communities: - Parametrized tests run against HED, EEGLAB, and future communities - No community-specific hardcoded values - Validates configuration structure, metadata, URLs, repos, DOIs - Auto-validates system prompt completeness and tool generation - Includes slow tests for URL/GitHub accessibility validation **Results:** - 30 generic tests (all passing for structure validation) - Works for both HED and EEGLAB without any community-specific code - URL validation caught 10 broken EEGLAB URLs and 1 broken HED URL - Eliminates need for ~30 hardcoded tests per community **Test markers:** - Fast tests (default): Structure and format validation - Slow tests (-m slow): HTTP requests for URL/GitHub validation - Security tests: SSRF protection (localhost/private IP detection) Implements Phase 1.5.1 of #111 * fix: update broken documentation URLs for HED and EEGLAB EEGLAB (10 URLs fixed): - Extensions: others/EEGLAB_Extensions.md - Re-referencing: rereferencing.md (lowercase) - Resampling: resampling.md (lowercase) - Channel rejection: Channel_rejection.md - ICLabel: plugins/ICLabel/index.md - clean_rawdata: plugins/clean_rawdata/index.md - Scrolling data: Scrolling_data.md - Selecting epochs: removed (doesn't exist) - BIDS: plugins/EEG-BIDS/index.md - LSL: README.rst HED (1 URL fixed): - HedAndEEGLAB: removed refs/heads, changed .md to .html * refactor: remove hardcoded YAML tests, keep minimal behavioral tests EEGLAB Phase 1 has no custom tools or unique behavioral logic. All YAML configuration validation is now handled by the generic test_community_yaml_generic.py parametrized tests. Changes: - Removed ~420 lines of hardcoded YAML value tests - Kept 1 behavioral test confirming standard CommunityAssistant usage - Added documentation explaining separation of concerns HED already covered by generic tests (no test_hed_config.py existed). Benefits: - Reduced test duplication (90 tests, same coverage) - Future communities get full coverage automatically - Maintainable: single generic suite vs N duplicated suites * feat: add --community flag to validate command Extends 'osa validate' with community mode for full test suite validation. Usage: - File mode: osa validate <config_path> - YAML syntax, schema validation, env vars - Community mode: osa validate --community <id> - Full pytest suite including URL accessibility, GitHub repos - Shows community info before running tests - Supports --verbose flag for detailed output Features: - Auto-discovers available communities - Lists available communities if ID not found - Prevents using both modes together - Returns exit code 0 on success, 1 on failure - Color-coded output for easy scanning Examples: osa validate --community eeglab osa validate --community hed --verbose osa validate src/assistants/eeglab/config.yaml Closes Phase 1.5.3 requirement from issue #111 * fix: address all PR review findings Documentation fixes: - Fix incorrect pytest marker usage instructions (use -m 'not slow' to skip) - Add --verbose flag documentation to validate() docstring - Enhance docstrings with implementation details (subprocess, registry clearing) - Document MagicMock exception in testing guidelines Code improvements: - Use public registry.list_all() instead of private _assistants attribute - Add behavioral tests for tool descriptions, system prompt content, tool callability - Add configuration summary output to validate --community command Testing enhancements: - Add test_retrieve_docs_tool_description_includes_doc_list - Add test_system_prompt_contains_actual_github_repos - Add test_knowledge_tools_are_callable - Document why registry clearing is needed at collection time All tests passing (36/36 for generic tests, 86/86 overall excluding slow HTTP tests)
* chore: bump version to 0.5.3.dev0 * fix: comprehensive error handling improvements for streaming Fixes all critical and important issues from PR review #107: **Critical Fixes:** 1. Fix message state corruption on streaming errors - Track message indices explicitly to avoid race conditions - Prevent wrong messages from being removed on error - Properly handle streaming vs non-streaming error paths 2. Improve saveHistory() error handling - Distinguish error types (QuotaExceededError, SecurityError) - Provide actionable user feedback for each error type - Re-throw errors so callers know save failed - Log errors with structured context 3. Fix backend exception masking - Add specific handlers for ValueError (input errors) - Preserve HTTPException for proper HTTP error codes - Add error IDs to all backend errors for support tracking - Include structured logging context **Important Fixes:** 4. Add fetch timeout to streaming requests - 120 second timeout for connection + streaming - Prevents indefinite hangs on connection failures 5. Fix worker error detail loss - Check response.ok before processing - Pass through backend HTTP error codes (400, 403, 429, etc.) - Extract and forward backend error messages - Only categorize actual network/proxy errors 6. Improve reader resource leak handling - Check if reader exists before releasing - Log cleanup failures (indicates serious issues) - Don't silently swallow releaseError **Impact:** - Prevents data loss from message corruption - Better user feedback on storage failures - Proper error categorization for debugging - Support can track errors with error IDs - No more indefinite hangs - Users see actual backend errors (not generic 500) All syntax checks pass. Ready for production. * fix: preserve prompt caching in tool-bound models (#113) * fix: preserve prompt caching when tools are bound to models - Update CachingLLMWrapper.bind_tools() to wrap tool-bound models - Add stream() and astream() methods with caching support - Allow wrapping Runnable types (e.g., RunnableBinding) not just BaseChatModel - Add comprehensive test suite for caching functionality Fixes #104 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat: add comprehensive error handling and validation to CachingLLMWrapper - Add logging to all wrapper methods for debugging - Add validation to __init__ to prevent double-wrapping - Add error handling to bind_tools with tool validation - Add error handling to _add_cache_control with message validation - Add error handling to stream/astream with input validation - Add unit tests for all error conditions - Add async tests for ainvoke and astream methods - Coverage improved from 41% to 60% for litellm_llm.py * fix: replace unsafe eval with safe AST-based calculator in tests - Replace eval() with ast.parse() and safe operator mapping - Only allow basic arithmetic operations (+, -, *, /, %, **) - Prevents code injection in test calculator tool - Maintains full test functionality * fix: eliminate silent failures and improve error handling Critical fixes addressing PR review: - Replace silent message skipping with hard errors (ValueError) - Replace silent None content fallback with fail-fast validation - Replace overly broad exception catching with specific exception types - Add type annotations to stream/astream methods (Iterator/AsyncIterator) - Add comprehensive error logging to invoke/ainvoke/_generate/_agenerate - Improve hasattr checks with callable() validation - Add actionable guidance to NotImplementedError messages - Change non-list input logging from DEBUG to WARNING (cost impact) All tests still pass. Coverage remains 36% (error paths require real LLM tests). * docs: improve docstrings for clarity and accuracy Addressing documentation issues from PR review: - Expand class docstring to explain nested wrapper chain - Clarify bind_tools() two-step process (delegate then wrap) - Update _add_cache_control() to document strict fail-fast validation - Clarify is_cacheable_model() uses permissive heuristic - Add clarifying comment to CACHEABLE_MODELS constant All docstrings now accurately reflect implementation behavior. * docs: document two-tier testing approach for NO MOCKS policy Add comprehensive module docstring explaining: - Why FakeListChatModel is used for unit tests (wrapper mechanics) - Why real API calls are used for integration tests (LLM behavior) - Clear separation between testing wrapper logic vs LLM responses This addresses the code reviewer's recommendation to document the exception to the NO MOCKS policy for wrapper mechanics testing. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * feat: add generic docstring extraction tools Add MATLAB and Python docstring parsers, sync system, and search tools for indexing code documentation from any community repository. Key components: - MATLAB parser: regex-based extraction of function/script comments - Python parser: AST-based extraction of docstrings - Sync system: fetch files from GitHub and index docstrings - Search: FTS5-powered docstring search with GitHub links - CLI: 'osa sync docstrings' command with language filters - Tools: LangChain tool factory for community integration Tests: 28 new tests (parsers + integration) Verified: 720 docstrings synced from sccn/eeglab Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: address PR review findings Critical fixes: - Add GitHub auth headers to avoid rate limiting - Improve error handling with specific exception types - Track and report failed files to users - Make Python parser raise SyntaxError properly Important fixes: - Optimize method detection using parent map (O(n)) - Update FTS5 docs to clarify phrase-only search - Document branch hardcoding limitation Tests: - Update test to expect SyntaxError - Remove mocked error tests (violates NO MOCKS rule) - All 71 tests passing --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: implement mailing list FAQ agent (Phase 3) Add generic, reusable tools to scrape Mailman archives and generate searchable FAQ database using LLM summarization. Database schema: - Add mailing_list_messages table with FTS5 for raw message archive - Add faq_entries table with FTS5 for LLM-generated FAQ summaries - Add summarization_status table for tracking progress Mailman scraper (src/knowledge/mailman_sync.py): - HTML fetching with caching and rate limiting - Parse year index, thread index, and message pages - Convert HTML to markdown for clean storage - Sync by year with progress tracking FAQ summarization (src/knowledge/faq_summarizer.py): - Two-stage approach: Haiku for scoring, Sonnet for summarization - Quality scoring to filter valuable threads - Cost estimation and tracking - JSON-based summary extraction (question, answer, tags, category) Search and tools: - Add search_faq_entries function with FTS5 - Create FAQ search tool factory for LangChain integration - Update create_knowledge_tools to include FAQ search CLI commands: - osa sync mailman: Sync mailing list archives - osa sync faq: Generate FAQ summaries with LLM Configuration: - Add MailmanConfig class for community config - Configure EEGLAB mailing list (eeglablist, 2004-present) Tested: - Database schema creation verified - Mailman scraper tested with 2026 EEGLAB data (42 messages) - FAQ search API verified - Tool factory validated Closes #101 * test: add comprehensive tests for Phase 3 mailing list FAQ tools Add tool-centered tests (not community-specific) that validate: - Mailman archive scraping (11 tests) - FAQ summarization with LLM (18 tests) - End-to-end pipeline integration (4 tests) Total: 33 tests, all passing Test coverage: - faq_summarizer.py: 89% - mailman_sync.py: 56% - Full pipeline tested with mocked HTTP and LLM responses Tests validate: - HTML parsing (year index, thread index, messages) - LLM quality scoring and summarization - Cost estimation - FTS5 search functionality - Error handling and edge cases - Complete scrape → store → search → summarize pipeline Follows .rules/testing_guidelines.md: - Dynamic tests (generic, reusable) - No hardcoded community names - Tests work for any Mailman-based mailing list * fix: address critical PR review findings - Replace broad exception catching with specific exception types - Add proper error handling for database operations - Return None vs 0.0 for LLM scoring errors to distinguish failures - Implement basic thread reconstruction using subject normalization - Remove manual thread_id workaround from E2E tests - Add sqlite3 import for database error handling - Improve error logging with structured context Fixes critical issues identified in PR review: - Thread reconstruction now works (subject-based grouping) - Silent failures replaced with actionable errors - Database errors separated from parsing errors - LLM errors distinguishable from low-quality scores All 33 Phase 3 tests passing. * test: add comprehensive test coverage for PR review findings - Add HTTP error scenario tests for mailman_sync - Test fetch failures (404, 500, timeout, network errors) - Test partial batch failures and malformed HTML handling - Add database transaction safety tests - Test idempotency of sync operations - Test duplicate message handling - Test partial sync resumption - Test commit batching at boundaries - Add LLM response variation tests for faq_summarizer - Test trailing commas, extra text, nested quotes - Test newlines, unicode characters, empty arrays - Test decimal format variations and verbose responses All 122 Phase 3 tests passing Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add plugin tools for EEGLAB docstring and FAQ search - Create src/assistants/eeglab/tools.py with two plugin tools - search_eeglab_docstrings: Search MATLAB/Python function docs - search_eeglab_faqs: Search 22 years of mailing list Q&A - Update config.yaml to register plugin tools in extensions section - Update system prompt with tool usage guidelines - Tools use @tool decorator pattern for auto-discovery * test: add comprehensive integration tests for EEGLAB assistant - Test config loading and validation - Test standard tool creation (discussions, papers, docs, recent) - Test plugin tool loading (docstrings, faqs) - Test system prompt includes tool references - Test tool count (6 total: 4 standard + 2 plugin) - Test plugin system integration - Test individual tool implementations with empty DB - All 13 tests passing, 3 skipped (require populated DB) * docs: add comprehensive EEGLAB assistant documentation - User guide: tool descriptions, example questions, tips, sync commands - Developer guide: architecture, adding tools, maintenance, troubleshooting - Covers all 6 tools (4 standard + 2 plugin) - Documents sync workflows for all knowledge bases - Includes performance monitoring and debugging tips * fix: address all PR review issues (critical, important, nice-to-haves) - Fix critical AttributeError bug in search_eeglab_docstrings tool - Changed from result.name/language/file_path to result.title/source/url - Updated docstring examples to match actual output format - Standardize error messages with multi-line format and admin guidance - Fix comment rot by changing '2004-2026' to 'since 2004' throughout - Add performance context to benchmarks in developer guide - Fix hardcoded paths in examples (/path/to/eeglab instead of ~/git/eeglab) - Make test assertions resilient (check for required tools instead of count) - Add populated_test_db fixture with sample docstring and FAQ entries - Add 6 new tests for populated database scenarios - All 19 tests passing with 100% coverage on tools.py * docs: fix temporal references to prevent documentation rot - Change '2004-2026' to 'since 2004' in config and user guide - Change '22 years' to 'over 20 years' for longevity - Add noqa comments for fixture side-effect patterns
- Add docstrings config section to YAML with branch per repo - Store branch in docstrings table for correct GitHub URLs - Separate docstring repos from issue/PR tracking repos - EEGLAB now uses 'develop' branch, ICLabel uses 'master' Resolves hardcoded 'main' branch issue for repos with different defaults.
- Add d.branch to SQL SELECT in search_docstrings (fixes KeyError) - Add NULL fallback for branch column (backward compatibility) - Remove broad Exception catches in sync loops (let bugs propagate) - Update branch parameter help text to mention repo defaults Issues addressed: - Critical: SQL query missing branch column - Critical: Broad exception catches hiding programming bugs - Important: Misleading branch parameter documentation
- Add validation for FAQ category and answer length - Validate category against allowed values, fallback to 'discussion' - Truncate answers >10KB to prevent bloat - Add test for branch parameter in GitHub URLs - Add test for NULL branch fallback to 'main' Issues addressed: - Important: FAQ field validation (prevent XSS, handle oversized data) - Test coverage: Branch validation tests (criticality 9/10)
- Add _migrate_db() to handle schema changes for existing databases - Automatically add branch column to docstrings table on deployment - Create comprehensive deployment checklist for dev environment - Includes verification steps, smoke tests, and rollback plan This ensures existing dev/prod databases are migrated smoothly when the epic is merged to develop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #97
Epic Summary
Complete implementation of the EEGLAB community assistant with comprehensive knowledge base integration.
Phases Completed
Phase 1: Basic Community Setup (#99)
Phase 1.5: YAML Testing Framework (#111)
Phase 2: Docstring Extraction (#115)
Phase 3: Mailing List FAQ Agent (#101, #121)
Phase 4: Integration & Testing (#102, #123)
Changes
New Files:
src/assistants/eeglab/config.yaml- Community configurationsrc/assistants/eeglab/tools.py- Plugin toolssrc/knowledge/docstring_sync.py- Docstring extractionsrc/knowledge/matlab_parser.py- MATLAB parsersrc/knowledge/python_parser.py- Python parsersrc/knowledge/mailman_sync.py- Mailing list scrapersrc/knowledge/faq_summarizer.py- FAQ generation.context/eeglab-developer-guide.md- Developer docs.context/eeglab-user-guide.md- User docstests/test_assistants/test_eeglab_integration.py- Integration teststests/test_assistants/test_community_yaml_generic.py- Generic YAML testsModified Files:
src/knowledge/db.py- Schema for docstrings and FAQssrc/knowledge/search.py- Search functionsTesting
Documentation