Skip to content

Phase 2: Docstring Extraction Tools #100

@neuromechanist

Description

@neuromechanist

Parent Epic: #97
Depends on: #99

Goal

Create generic tools to extract and index MATLAB/Python docstrings from any codebase.

Tasks

MATLAB Docstring Extractor

  • Create src/tools/docstring/matlab.py
    • Parse MATLAB file header comments (% lines before function)
    • Extract: function name, purpose, inputs, outputs, examples, see-also
    • Handle various comment styles
    • Recursive repository traversal
  • Database schema for docstrings
    CREATE TABLE docstrings (
      id, community_id, language, repo, file_path,
      symbol_name, symbol_type, docstring,
      parameters, returns, examples, indexed_at
    )
  • CLI command: osa sync docstrings --community eeglab --language matlab
  • LangChain tool wrapper: create_search_matlab_docs_tool()
  • Tests (parse various MATLAB comment styles)

Python Docstring Extractor

  • Create src/tools/docstring/python.py
    • Use ast module for parsing
    • Extract from functions, classes, methods, modules
    • Support NumPy/Google/Sphinx formats
  • LangChain tool wrapper: create_search_python_docs_tool()
  • Tests (various docstring formats)

Integration

  • Add to sync system
  • Create tool factories (generic, not EEGLab-specific)
  • Index existing repos
  • Performance testing

Key Design Principle

These are GENERIC tools that any community can use, not EEGLab-specific.
Other communities (BIDS, MNE, FieldTrip) should be able to enable them via config.

Success Criteria

  • Can extract MATLAB docstrings from eeglab repo
  • Can extract Python docstrings from Python-based EEG tools
  • Tools are generic/reusable
  • Database populated
  • Search works correctly
  • Tests pass
  • Ready for PR review

Timeline

2 weeks

Challenges

  • MATLAB syntax variations
  • Large codebase traversal
  • Regex patterns for comment extraction

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or enhancement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions