feat: EEGLAB Community Assistant (Epic #97) by neuromechanist · Pull Request #124 · OpenScience-Collective/osa

neuromechanist · 2026-01-27T23:13:34Z

Closes #97

Epic Summary

Complete implementation of the EEGLAB community assistant with comprehensive knowledge base integration.

Phases Completed

Phase 1: Basic Community Setup (#99)

Community configuration (repos, docs, mailing lists)
YAML-based registry system
Standard knowledge tools (GitHub, papers, docs)

Phase 1.5: YAML Testing Framework (#111)

Generic YAML-based tests for ALL communities
Automatic test coverage for new communities
No community-specific test code required

Phase 2: Docstring Extraction (#115)

Generic docstring extraction for MATLAB and Python
Function signature parsing
GitHub source linking
FTS5 full-text search

Phase 3: Mailing List FAQ Agent (#101, #121)

Mailman archive scraper (2004-present)
LLM-based FAQ summarization
Quality scoring and categorization
Thread linking

Phase 4: Integration & Testing (#102, #123)

Plugin tools (search_eeglab_docstrings, search_eeglab_faqs)
Comprehensive integration tests (19 tests, 100% coverage)
User and developer documentation
All PR review issues addressed

Changes

New Files:

src/assistants/eeglab/config.yaml - Community configuration
src/assistants/eeglab/tools.py - Plugin tools
src/knowledge/docstring_sync.py - Docstring extraction
src/knowledge/matlab_parser.py - MATLAB parser
src/knowledge/python_parser.py - Python parser
src/knowledge/mailman_sync.py - Mailing list scraper
src/knowledge/faq_summarizer.py - FAQ generation
.context/eeglab-developer-guide.md - Developer docs
.context/eeglab-user-guide.md - User docs
tests/test_assistants/test_eeglab_integration.py - Integration tests
tests/test_assistants/test_community_yaml_generic.py - Generic YAML tests

Modified Files:

src/knowledge/db.py - Schema for docstrings and FAQs
src/knowledge/search.py - Search functions
Various CLI and sync utilities

Testing

✅ Generic YAML-based tests (automatically covers EEGLAB + all communities)
✅ 19 EEGLAB-specific integration tests passing
✅ 100% coverage on plugin tools
✅ All review issues addressed
✅ Real database operations (no mocks)

Documentation

✅ User guide with examples
✅ Developer guide with architecture
✅ Sync workflow documentation
✅ Troubleshooting guide

* Make langfuse optional to fix Python 3.14 compatibility - Move langfuse from dependencies to optional-dependencies[observability] - Add lazy import with try/except in get_langfuse_handler() - Provides helpful warning if langfuse not installed - Fixes Pydantic v1 compatibility issues on Python 3.14 - Relates to #108 * feat: add EEGLAB community assistant configuration - Create config.yaml with complete EEGLAB assistant setup - 25+ documentation sources from sccn.github.io - 6 GitHub repos (eeglab, ICLabel, clean_rawdata, EEG-BIDS, LSL) - 3 core paper DOIs and 6 citation queries - Custom system prompt with EEG workflow guidance - Auto-generated knowledge tools (docs, discussions, papers) - Add comprehensive unit tests (26 tests, all passing) Follows HED assistant pattern for Phase 1 implementation. * docs: add Phase 1 implementation summary * fix: address PR review findings - Fix DOI typo in summary (ffinf -> fninf) - Fix line counts (340 lines, not 504) - Remove duplicate doc entry (Installation/quickstart same URL) - Update doc counts (26 total: 2 preloaded + 24 on-demand) All review issues addressed, tests passing. * docs: add local testing and epic branch workflow guides - Add comprehensive local testing guide for backend/CLI - Add epic branch workflow for multi-phase features - Add quick test script for verification * docs: update epic workflow to use worktrees - Epic branch should be a worktree, not in main repo - Update all instructions to reflect worktree structure - Add worktree management section * Simplify EEGLAB tests with shared fixtures - Consolidate fixtures at module level (setup_registry, eeglab_config, eeglab_assistant) - Remove duplicate fixture definitions across test classes - Reduce test file from 449 to 317 lines (29% reduction) - Improve test readability and maintainability - All 26 tests still pass * fix: address all PR review findings **Critical Fixes:** - Fix 23/26 broken documentation URLs to match actual SCCN GitHub structure - Move test_eeglab_interactive.py to scripts/ directory - Fix documentation count inconsistencies (25 → 26 sources) **Documentation Improvements:** - Update config.yaml line count (340 → 334) - Clarify preload limit is a guideline (2-3 recommended, not required) - Mark knowledge base numbers as projections (~150+ issues, ~470+ PRs expected) - Update doc titles to match actual files (e.g., channel spectra, ERP images) **Test Enhancements (26 → 33 tests):** - Add database error handling test (graceful degradation) - Add SSRF validation test (security - rejects localhost/private IPs) - Add GitHub repo format validation test (rejects invalid formats) - Add system prompt substitution test (no unfilled placeholders) - Add tool input validation tests (empty queries, long queries) - Add preload handling test (creation without preload) **All tests passing:** 33/33 ✅ Addresses review findings from code-reviewer, pr-test-analyzer, and comment-analyzer agents.

* feat: add generic YAML-based testing framework Create unified test suite that automatically validates ALL communities: - Parametrized tests run against HED, EEGLAB, and future communities - No community-specific hardcoded values - Validates configuration structure, metadata, URLs, repos, DOIs - Auto-validates system prompt completeness and tool generation - Includes slow tests for URL/GitHub accessibility validation **Results:** - 30 generic tests (all passing for structure validation) - Works for both HED and EEGLAB without any community-specific code - URL validation caught 10 broken EEGLAB URLs and 1 broken HED URL - Eliminates need for ~30 hardcoded tests per community **Test markers:** - Fast tests (default): Structure and format validation - Slow tests (-m slow): HTTP requests for URL/GitHub validation - Security tests: SSRF protection (localhost/private IP detection) Implements Phase 1.5.1 of #111 * fix: update broken documentation URLs for HED and EEGLAB EEGLAB (10 URLs fixed): - Extensions: others/EEGLAB_Extensions.md - Re-referencing: rereferencing.md (lowercase) - Resampling: resampling.md (lowercase) - Channel rejection: Channel_rejection.md - ICLabel: plugins/ICLabel/index.md - clean_rawdata: plugins/clean_rawdata/index.md - Scrolling data: Scrolling_data.md - Selecting epochs: removed (doesn't exist) - BIDS: plugins/EEG-BIDS/index.md - LSL: README.rst HED (1 URL fixed): - HedAndEEGLAB: removed refs/heads, changed .md to .html * refactor: remove hardcoded YAML tests, keep minimal behavioral tests EEGLAB Phase 1 has no custom tools or unique behavioral logic. All YAML configuration validation is now handled by the generic test_community_yaml_generic.py parametrized tests. Changes: - Removed ~420 lines of hardcoded YAML value tests - Kept 1 behavioral test confirming standard CommunityAssistant usage - Added documentation explaining separation of concerns HED already covered by generic tests (no test_hed_config.py existed). Benefits: - Reduced test duplication (90 tests, same coverage) - Future communities get full coverage automatically - Maintainable: single generic suite vs N duplicated suites * feat: add --community flag to validate command Extends 'osa validate' with community mode for full test suite validation. Usage: - File mode: osa validate <config_path> - YAML syntax, schema validation, env vars - Community mode: osa validate --community <id> - Full pytest suite including URL accessibility, GitHub repos - Shows community info before running tests - Supports --verbose flag for detailed output Features: - Auto-discovers available communities - Lists available communities if ID not found - Prevents using both modes together - Returns exit code 0 on success, 1 on failure - Color-coded output for easy scanning Examples: osa validate --community eeglab osa validate --community hed --verbose osa validate src/assistants/eeglab/config.yaml Closes Phase 1.5.3 requirement from issue #111 * fix: address all PR review findings Documentation fixes: - Fix incorrect pytest marker usage instructions (use -m 'not slow' to skip) - Add --verbose flag documentation to validate() docstring - Enhance docstrings with implementation details (subprocess, registry clearing) - Document MagicMock exception in testing guidelines Code improvements: - Use public registry.list_all() instead of private _assistants attribute - Add behavioral tests for tool descriptions, system prompt content, tool callability - Add configuration summary output to validate --community command Testing enhancements: - Add test_retrieve_docs_tool_description_includes_doc_list - Add test_system_prompt_contains_actual_github_repos - Add test_knowledge_tools_are_callable - Document why registry clearing is needed at collection time All tests passing (36/36 for generic tests, 86/86 overall excluding slow HTTP tests)

* chore: bump version to 0.5.3.dev0 * fix: comprehensive error handling improvements for streaming Fixes all critical and important issues from PR review #107: **Critical Fixes:** 1. Fix message state corruption on streaming errors - Track message indices explicitly to avoid race conditions - Prevent wrong messages from being removed on error - Properly handle streaming vs non-streaming error paths 2. Improve saveHistory() error handling - Distinguish error types (QuotaExceededError, SecurityError) - Provide actionable user feedback for each error type - Re-throw errors so callers know save failed - Log errors with structured context 3. Fix backend exception masking - Add specific handlers for ValueError (input errors) - Preserve HTTPException for proper HTTP error codes - Add error IDs to all backend errors for support tracking - Include structured logging context **Important Fixes:** 4. Add fetch timeout to streaming requests - 120 second timeout for connection + streaming - Prevents indefinite hangs on connection failures 5. Fix worker error detail loss - Check response.ok before processing - Pass through backend HTTP error codes (400, 403, 429, etc.) - Extract and forward backend error messages - Only categorize actual network/proxy errors 6. Improve reader resource leak handling - Check if reader exists before releasing - Log cleanup failures (indicates serious issues) - Don't silently swallow releaseError **Impact:** - Prevents data loss from message corruption - Better user feedback on storage failures - Proper error categorization for debugging - Support can track errors with error IDs - No more indefinite hangs - Users see actual backend errors (not generic 500) All syntax checks pass. Ready for production. * fix: preserve prompt caching in tool-bound models (#113) * fix: preserve prompt caching when tools are bound to models - Update CachingLLMWrapper.bind_tools() to wrap tool-bound models - Add stream() and astream() methods with caching support - Allow wrapping Runnable types (e.g., RunnableBinding) not just BaseChatModel - Add comprehensive test suite for caching functionality Fixes #104 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat: add comprehensive error handling and validation to CachingLLMWrapper - Add logging to all wrapper methods for debugging - Add validation to __init__ to prevent double-wrapping - Add error handling to bind_tools with tool validation - Add error handling to _add_cache_control with message validation - Add error handling to stream/astream with input validation - Add unit tests for all error conditions - Add async tests for ainvoke and astream methods - Coverage improved from 41% to 60% for litellm_llm.py * fix: replace unsafe eval with safe AST-based calculator in tests - Replace eval() with ast.parse() and safe operator mapping - Only allow basic arithmetic operations (+, -, *, /, %, **) - Prevents code injection in test calculator tool - Maintains full test functionality * fix: eliminate silent failures and improve error handling Critical fixes addressing PR review: - Replace silent message skipping with hard errors (ValueError) - Replace silent None content fallback with fail-fast validation - Replace overly broad exception catching with specific exception types - Add type annotations to stream/astream methods (Iterator/AsyncIterator) - Add comprehensive error logging to invoke/ainvoke/_generate/_agenerate - Improve hasattr checks with callable() validation - Add actionable guidance to NotImplementedError messages - Change non-list input logging from DEBUG to WARNING (cost impact) All tests still pass. Coverage remains 36% (error paths require real LLM tests). * docs: improve docstrings for clarity and accuracy Addressing documentation issues from PR review: - Expand class docstring to explain nested wrapper chain - Clarify bind_tools() two-step process (delegate then wrap) - Update _add_cache_control() to document strict fail-fast validation - Clarify is_cacheable_model() uses permissive heuristic - Add clarifying comment to CACHEABLE_MODELS constant All docstrings now accurately reflect implementation behavior. * docs: document two-tier testing approach for NO MOCKS policy Add comprehensive module docstring explaining: - Why FakeListChatModel is used for unit tests (wrapper mechanics) - Why real API calls are used for integration tests (LLM behavior) - Clear separation between testing wrapper logic vs LLM responses This addresses the code reviewer's recommendation to document the exception to the NO MOCKS policy for wrapper mechanics testing. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * feat: add generic docstring extraction tools Add MATLAB and Python docstring parsers, sync system, and search tools for indexing code documentation from any community repository. Key components: - MATLAB parser: regex-based extraction of function/script comments - Python parser: AST-based extraction of docstrings - Sync system: fetch files from GitHub and index docstrings - Search: FTS5-powered docstring search with GitHub links - CLI: 'osa sync docstrings' command with language filters - Tools: LangChain tool factory for community integration Tests: 28 new tests (parsers + integration) Verified: 720 docstrings synced from sccn/eeglab Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: address PR review findings Critical fixes: - Add GitHub auth headers to avoid rate limiting - Improve error handling with specific exception types - Track and report failed files to users - Make Python parser raise SyntaxError properly Important fixes: - Optimize method detection using parent map (O(n)) - Update FTS5 docs to clarify phrase-only search - Document branch hardcoding limitation Tests: - Update test to expect SyntaxError - Remove mocked error tests (violates NO MOCKS rule) - All 71 tests passing --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement mailing list FAQ agent (Phase 3) Add generic, reusable tools to scrape Mailman archives and generate searchable FAQ database using LLM summarization. Database schema: - Add mailing_list_messages table with FTS5 for raw message archive - Add faq_entries table with FTS5 for LLM-generated FAQ summaries - Add summarization_status table for tracking progress Mailman scraper (src/knowledge/mailman_sync.py): - HTML fetching with caching and rate limiting - Parse year index, thread index, and message pages - Convert HTML to markdown for clean storage - Sync by year with progress tracking FAQ summarization (src/knowledge/faq_summarizer.py): - Two-stage approach: Haiku for scoring, Sonnet for summarization - Quality scoring to filter valuable threads - Cost estimation and tracking - JSON-based summary extraction (question, answer, tags, category) Search and tools: - Add search_faq_entries function with FTS5 - Create FAQ search tool factory for LangChain integration - Update create_knowledge_tools to include FAQ search CLI commands: - osa sync mailman: Sync mailing list archives - osa sync faq: Generate FAQ summaries with LLM Configuration: - Add MailmanConfig class for community config - Configure EEGLAB mailing list (eeglablist, 2004-present) Tested: - Database schema creation verified - Mailman scraper tested with 2026 EEGLAB data (42 messages) - FAQ search API verified - Tool factory validated Closes #101 * test: add comprehensive tests for Phase 3 mailing list FAQ tools Add tool-centered tests (not community-specific) that validate: - Mailman archive scraping (11 tests) - FAQ summarization with LLM (18 tests) - End-to-end pipeline integration (4 tests) Total: 33 tests, all passing Test coverage: - faq_summarizer.py: 89% - mailman_sync.py: 56% - Full pipeline tested with mocked HTTP and LLM responses Tests validate: - HTML parsing (year index, thread index, messages) - LLM quality scoring and summarization - Cost estimation - FTS5 search functionality - Error handling and edge cases - Complete scrape → store → search → summarize pipeline Follows .rules/testing_guidelines.md: - Dynamic tests (generic, reusable) - No hardcoded community names - Tests work for any Mailman-based mailing list * fix: address critical PR review findings - Replace broad exception catching with specific exception types - Add proper error handling for database operations - Return None vs 0.0 for LLM scoring errors to distinguish failures - Implement basic thread reconstruction using subject normalization - Remove manual thread_id workaround from E2E tests - Add sqlite3 import for database error handling - Improve error logging with structured context Fixes critical issues identified in PR review: - Thread reconstruction now works (subject-based grouping) - Silent failures replaced with actionable errors - Database errors separated from parsing errors - LLM errors distinguishable from low-quality scores All 33 Phase 3 tests passing. * test: add comprehensive test coverage for PR review findings - Add HTTP error scenario tests for mailman_sync - Test fetch failures (404, 500, timeout, network errors) - Test partial batch failures and malformed HTML handling - Add database transaction safety tests - Test idempotency of sync operations - Test duplicate message handling - Test partial sync resumption - Test commit batching at boundaries - Add LLM response variation tests for faq_summarizer - Test trailing commas, extra text, nested quotes - Test newlines, unicode characters, empty arrays - Test decimal format variations and verbose responses All 122 Phase 3 tests passing Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

@tool

* feat: add plugin tools for EEGLAB docstring and FAQ search - Create src/assistants/eeglab/tools.py with two plugin tools - search_eeglab_docstrings: Search MATLAB/Python function docs - search_eeglab_faqs: Search 22 years of mailing list Q&A - Update config.yaml to register plugin tools in extensions section - Update system prompt with tool usage guidelines - Tools use @tool decorator pattern for auto-discovery * test: add comprehensive integration tests for EEGLAB assistant - Test config loading and validation - Test standard tool creation (discussions, papers, docs, recent) - Test plugin tool loading (docstrings, faqs) - Test system prompt includes tool references - Test tool count (6 total: 4 standard + 2 plugin) - Test plugin system integration - Test individual tool implementations with empty DB - All 13 tests passing, 3 skipped (require populated DB) * docs: add comprehensive EEGLAB assistant documentation - User guide: tool descriptions, example questions, tips, sync commands - Developer guide: architecture, adding tools, maintenance, troubleshooting - Covers all 6 tools (4 standard + 2 plugin) - Documents sync workflows for all knowledge bases - Includes performance monitoring and debugging tips * fix: address all PR review issues (critical, important, nice-to-haves) - Fix critical AttributeError bug in search_eeglab_docstrings tool - Changed from result.name/language/file_path to result.title/source/url - Updated docstring examples to match actual output format - Standardize error messages with multi-line format and admin guidance - Fix comment rot by changing '2004-2026' to 'since 2004' throughout - Add performance context to benchmarks in developer guide - Fix hardcoded paths in examples (/path/to/eeglab instead of ~/git/eeglab) - Make test assertions resilient (check for required tools instead of count) - Add populated_test_db fixture with sample docstring and FAQ entries - Add 6 new tests for populated database scenarios - All 19 tests passing with 100% coverage on tools.py * docs: fix temporal references to prevent documentation rot - Change '2004-2026' to 'since 2004' in config and user guide - Change '22 years' to 'over 20 years' for longevity - Add noqa comments for fixture side-effect patterns

- Add docstrings config section to YAML with branch per repo - Store branch in docstrings table for correct GitHub URLs - Separate docstring repos from issue/PR tracking repos - EEGLAB now uses 'develop' branch, ICLabel uses 'master' Resolves hardcoded 'main' branch issue for repos with different defaults.

- Add d.branch to SQL SELECT in search_docstrings (fixes KeyError) - Add NULL fallback for branch column (backward compatibility) - Remove broad Exception catches in sync loops (let bugs propagate) - Update branch parameter help text to mention repo defaults Issues addressed: - Critical: SQL query missing branch column - Critical: Broad exception catches hiding programming bugs - Important: Misleading branch parameter documentation

- Add validation for FAQ category and answer length - Validate category against allowed values, fallback to 'discussion' - Truncate answers >10KB to prevent bloat - Add test for branch parameter in GitHub URLs - Add test for NULL branch fallback to 'main' Issues addressed: - Important: FAQ field validation (prevent XSS, handle oversized data) - Test coverage: Branch validation tests (criticality 9/10)

- Add _migrate_db() to handle schema changes for existing databases - Automatically add branch column to docstrings table on deployment - Create comprehensive deployment checklist for dev environment - Includes verification steps, smoke tests, and rollback plan This ensures existing dev/prod databases are migrated smoothly when the epic is merged to develop.

…alled

neuromechanist and others added 12 commits January 27, 2026 08:02

Merge branch 'develop' into epic/issue-97-eeglab

9761665

style: fix formatting after merge from develop

3a2b9e2

fix: handle missing FTS5 tables and skip Langfuse tests when not inst…

24737bc

…alled

neuromechanist merged commit da50aa0 into develop Jan 28, 2026
5 checks passed

neuromechanist deleted the epic/issue-97-eeglab branch January 28, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: EEGLAB Community Assistant (Epic #97)#124

feat: EEGLAB Community Assistant (Epic #97)#124
neuromechanist merged 12 commits into
developfrom
epic/issue-97-eeglab

neuromechanist commented Jan 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neuromechanist commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Epic Summary

Phases Completed

Phase 1: Basic Community Setup (#99)

Phase 1.5: YAML Testing Framework (#111)

Phase 2: Docstring Extraction (#115)

Phase 3: Mailing List FAQ Agent (#101, #121)

Phase 4: Integration & Testing (#102, #123)

Changes

Testing

Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neuromechanist commented Jan 27, 2026 •

edited

Loading