OpenScience-Collective · neuromechanist · Feb 7, 2026 · Feb 7, 2026 · Feb 7, 2026 · Feb 7, 2026
diff --git a/.context/nemar-api.md b/.context/nemar-api.md
@@ -0,0 +1,170 @@
+# NEMAR API Reference
+
+Technical reference for the NEMAR (NeuroElectroMagnetic Archive) public API used by the NEMAR community assistant tools.
+
+**Base URL:** `https://nemar.org/api/dataexplorer/datapipeline`
+**Authentication:** None required (fully public)
+**Only valid table:** `dataexplorer_dataset`
+
+## Endpoints
+
+### 1. List Datasets - `/records`
+
+Fetch paginated dataset records.
+
+```bash
+curl --request GET \
+  --url 'https://nemar.org/api/dataexplorer/datapipeline/records' \
+  -H 'Content-Type: application/json' \
+  -d '{"table_name":"dataexplorer_dataset", "start": 0, "limit": 10}'
+```
+
+**Parameters:**
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `table_name` | string | Yes | Must be `"dataexplorer_dataset"` |
+| `start` | int | Yes | Pagination offset (0-based) |
+| `limit` | int | Yes | Number of records (can use 1000 to get all) |
+
+**Response:**
+```json
+{
+  "total": 485,
+  "entries": {
+    "0": { /* dataset object */ },
+    "1": { /* dataset object */ },
+    ...
+  },
+  "start": 0,
+  "limit": 10,
+  "success": true
+}
+```
+
+**Notes:**
+- `entries` uses string indices (`"0"`, `"1"`, etc.), not an array
+- No server-side search, filter, or sort; must fetch and filter client-side
+- Can fetch all datasets in one call with `limit=1000`
+- As of 2025, there are ~485 datasets
+
+### 2. Get Dataset by ID - `/datasetid`
+
+Fetch a single dataset by its identifier.
+
+```bash
+curl --request GET \
+  --url 'https://nemar.org/api/dataexplorer/datapipeline/datasetid' \
+  -H 'Content-Type: application/json' \
+  -d '{"table_name":"dataexplorer_dataset", "dataset_id": "ds005697"}'
+```
+
+**Parameters:**
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `table_name` | string | Yes | Must be `"dataexplorer_dataset"` |
+| `dataset_id` | string | Yes | Dataset ID (e.g., `"ds005697"`) |
+
+**Response:**
+```json
+{
+  "entry": {
+    "0": { /* dataset object */ }
+  },
+  "success": true
+}
+```
+
+**Notes:**
+- Returns empty `entry: {}` for invalid IDs (still `success: true`)
+- `entry` uses same string-indexed dict pattern as `entries`
+
+## Dataset Schema
+
+Each dataset has 31 fields:
+
+### Identifiers
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | string | Dataset ID (e.g., `"ds005697"`) |
+| `name` | string | Human-readable name (often descriptive) |
+| `created` | string | Creation timestamp (`YYYY-MM-DD HH:MM:SS`) |
+| `publishDate` | string | Publication timestamp |
+| `uploader` | string | Username of original uploader |
+| `latestSnapshot` | string | Version string (e.g., `"1.0.2"`) |
+| `DatasetDOI` | string | DOI (e.g., `"doi:10.18112/openneuro.ds005697.v1.0.2"`) |
+
+### BIDS Metadata
+| Field | Type | Description |
+|-------|------|-------------|
+| `BIDSVersion` | string | BIDS spec version (e.g., `"1.8.0"`) |
+| `License` | string | Data license (typically `"CC0"`) |
+| `Authors` | string | Author list (comma-separated or `===NEMAR-SEP===` delimited) |
+| `Acknowledgements` | string | Acknowledgement text |
+| `HowToAcknowledge` | string | Citation instructions |
+| `Funding` | string | Funding sources (`===NEMAR-SEP===` delimited) |
+| `ReferencesAndLinks` | string | URLs/references (`===NEMAR-SEP===` delimited) |
+| `EthicsApprovals` | string | Ethics approval information |
+| `readme` | string | Full README.md content (can be very long) |
+
+### Experimental Details
+| Field | Type | Description |
+|-------|------|-------------|
+| `tasks` | string | Comma-separated task names (e.g., `"rest, gonogo"`) |
+| `modalities` | string | Comma-separated modalities (e.g., `"EEG"`, `"MEG, MRI"`) |
+| `HEDVersion` | string | HED schema version (empty if not annotated) |
+| `hedAnnotation` | int | `0` or `1` (whether HED annotations are present) |
+
+### Dataset Size
+| Field | Type | Description |
+|-------|------|-------------|
+| `participants` | int | Number of subjects |
+| `sessionsNum` | int | Number of sessions |
+| `totalFiles` | int | Total file count |
+| `file_size` | int | Size in bytes |
+| `byte_size_format` | string | Human-readable size (e.g., `"66.6 GB"`) |
+| `age_min` | int | Minimum participant age (`0` if unspecified) |
+| `age_max` | int | Maximum participant age (`0` if unspecified) |
+
+### Platform Flags
+| Field | Type | Description |
+|-------|------|-------------|
+| `onBrainlife` | int | `0`/`1` - available on Brainlife |
+| `local_dataset` | int | `0`/`1` - available locally |
+| `processed` | int | `0`/`1` - has processed data |
+
+## Multi-Value Fields
+
+Some fields use `===NEMAR-SEP===` as a delimiter for multiple values:
+- `Funding`: Multiple funding sources
+- `ReferencesAndLinks`: Multiple URLs/references
+- `Authors`: Sometimes (also comma-separated in some datasets)
+
+Example:
+```
+"NIH R01NS047293===NEMAR-SEP===NSF BCS-0924532===NEMAR-SEP===ONR N00014-16-1-2257"
+```
+
+Split on `===NEMAR-SEP===` and strip whitespace from each part.
+
+## URL Patterns
+
+- **NEMAR detail page:** `https://nemar.org/dataexplorer/detail?dataset_id={id}`
+- **OpenNeuro page:** `https://openneuro.org/datasets/{id}`
+- **OpenNeuro version:** `https://openneuro.org/datasets/{id}/versions/{latestSnapshot}`
+
+## Limitations
+
+1. **No server-side search/filter/sort** - must fetch all and filter client-side
+2. **Only one valid table** - `dataexplorer_dataset` (others return validation errors)
+3. **Only two endpoints** - `/records` and `/datasetid` (no `/search`, `/tables`, etc.)
+4. **GET with body** - API uses GET method but expects JSON body (unusual; works with curl `-d`)
+5. **String-indexed responses** - entries/entry use `{"0": ..., "1": ...}` instead of arrays
+6. **No rate limiting observed** - but be reasonable with request frequency
+
+## Dataset Statistics (as of early 2025)
+
+- **Total datasets:** ~485
+- **Common modalities:** EEG (~53), MEG (~9), MEG+MRI (~7), EEG+MRI (~6), iEEG (~5)
+- **Datasets with HED annotations:** ~6
+- **Largest by participants:** ds002181 (226), ds003655 (156), ds003474 (122)
+- **Common tasks:** rest, noise, gonogo, memory, attention, various experimental paradigms
diff --git a/.github/workflows/sync-worker-cors.yml b/.github/workflows/sync-worker-cors.yml
@@ -93,6 +93,7 @@ jobs:
           git config user.email "github-actions[bot]@users.noreply.github.com"
           git add workers/osa-worker/index.js
           git commit -m "chore: sync worker CORS from community configs [skip ci]"
+          git pull --rebase origin ${{ github.ref_name }}
           git push
 
       - name: Deploy to Cloudflare Workers (production)

diff --git a/README.md b/README.md
@@ -6,8 +6,9 @@ An extensible AI assistant platform for open science projects, built with LangGr
 
 OSA provides domain-specific AI assistants for open science tools with:
 - **HED Assistant**: Hierarchical Event Descriptors for neuroimaging annotation
-- **BIDS Assistant**: Brain Imaging Data Structure (coming soon)
-- **EEGLAB Assistant**: EEG analysis toolbox (coming soon)
+- **BIDS Assistant**: Brain Imaging Data Structure
+- **EEGLAB Assistant**: EEG analysis toolbox
+- **NEMAR Assistant**: BIDS-formatted EEG, MEG, and iEEG dataset discovery
 
 Features:
 - **YAML-driven community registry** - add a new assistant with just a config file

diff --git a/frontend/index.html b/frontend/index.html
@@ -155,63 +155,73 @@
   <script src="osa-chat-widget.js"></script>
 
   <script>
-    // Community configurations for the demo page
-    const COMMUNITIES = {
-      hed: {
-        name: 'HED',
-        fullName: 'Hierarchical Event Descriptors',
-        description: 'Annotation standard for neuroimaging experiments',
-        status: 'active',
-        widget: {
-          communityId: 'hed',
-          title: 'HED Assistant',
-          initialMessage: 'Hi! I\'m the HED Assistant. I can help with HED (Hierarchical Event Descriptors), annotation, validation, and related tools. What would you like to know?',
-          placeholder: 'Ask about HED...',
-          suggestedQuestions: [
-            'What is HED and how is it used?',
-            'How do I annotate an event with HED tags?',
-            'What tools are available for working with HED?',
-            'Explain this HED validation error.'
-          ]
+    // Community configurations fetched from the API
+    let COMMUNITIES = {};
+
+    // Determine API endpoint (same logic as the widget)
+    const hostname = window.location.hostname;
+    const isDemoPage = hostname === 'osa-demo.pages.dev';
+    const API_BASE = isDemoPage
+      ? 'https://osa-worker.shirazi-10f.workers.dev'
+      : 'https://osa-worker-dev.shirazi-10f.workers.dev';
+
+    // Fetch community configs from the backend
+    async function loadCommunities() {
+      try {
+        const response = await fetch(`${API_BASE}/communities`);
+        if (!response.ok) {
+          throw new Error(`HTTP ${response.status}`);
         }
-      },
-      bids: {
-        name: 'BIDS',
-        fullName: 'Brain Imaging Data Structure',
-        description: 'Standard for organizing neuroimaging data',
-        status: 'soon',
-        widget: {
-          communityId: 'bids',
-          title: 'BIDS Assistant',
-          initialMessage: 'Hi! I\'m the BIDS Assistant. I can help with Brain Imaging Data Structure (BIDS), data organization, validation, and related tools.',
-          placeholder: 'Ask about BIDS...',
-          suggestedQuestions: [
-            'What is BIDS and why should I use it?',
-            'How do I organize my EEG data in BIDS?',
-            'What are the BIDS Common Principles?',
-            'How do I validate my BIDS dataset?'
-          ]
+        const data = await response.json();
+        if (!Array.isArray(data)) {
+          throw new Error('Invalid response format');
         }
-      },
-      eeglab: {
-        name: 'EEGLAB',
-        fullName: 'EEG Analysis Toolbox',
-        description: 'MATLAB toolbox for EEG analysis',
-        status: 'active',
-        widget: {
-          communityId: 'eeglab',
-          title: 'EEGLAB Assistant',
-          initialMessage: 'Hi! I\'m the EEGLAB Assistant. I can help with EEG analysis, MATLAB scripting, and EEGLAB plugins.',
-          placeholder: 'Ask about EEGLAB...',
-          suggestedQuestions: [
-            'How do I load EEG data in EEGLAB?',
-            'What preprocessing steps should I follow?',
-            'How do I run ICA in EEGLAB?',
-            'What plugins are available?'
-          ]
+
+        for (const community of data) {
+          const w = community.widget || {};
+          const shortName = community.name.split('(')[0].trim() || community.name;
+          COMMUNITIES[community.id] = {
+            name: shortName,
+            fullName: community.name,
+            description: community.description,
+            status: community.status === 'available' ? 'active' : 'soon',
+            widget: {
+              communityId: community.id,
+              title: w.title,
+              initialMessage: w.initial_message || `Hi! I'm the ${shortName} Assistant. How can I help you?`,
+              placeholder: w.placeholder,
+              suggestedQuestions: w.suggested_questions || []
+            }
+          };
         }
+      } catch (error) {
+        console.error('[OSA] Failed to load communities from API:', error);
+        const container = document.getElementById('page-content');
+        container.textContent = '';
+        const h1 = document.createElement('h1');
+        h1.textContent = 'Open Science Assistant';
+        const box = document.createElement('div');
+        box.className = 'info-box';
+        const h2 = document.createElement('h2');
+        h2.textContent = 'Could not load communities';
+        const p1 = document.createElement('p');
+        p1.textContent = 'Failed to connect to the backend API. Please try refreshing the page.';
+        const p2 = document.createElement('p');
+        p2.style.cssText = 'color: #6b7280; font-size: 0.9em;';
+        p2.textContent = `Error: ${error.message}`;
+        box.append(h2, p1, p2);
+        container.append(h1, box);
+        return;
       }
-    };
+
+      // Route after loading
+      const communityId = getCommunityFromPath();
+      if (communityId) {
+        renderCommunity(communityId);
+      } else {
+        renderLanding();
+      }
+    }
 
     // Determine community from URL path (first segment only)
     function getCommunityFromPath() {
@@ -232,6 +242,9 @@
     // Render landing page (no specific community)
     function renderLanding() {
       document.title = 'Open Science Assistant - Demo';
+      const communityIds = Object.keys(COMMUNITIES);
+      const exampleId = communityIds[0] || 'hed';
+      const otherIds = communityIds.slice(1).map(id => `'${id}'`).join(', ');
       const html = `
         <h1>Open Science Assistant</h1>
         <p>AI-powered assistants for research communities. Each community gets a specialized assistant with domain expertise, documentation, and tools.</p>
@@ -259,7 +272,7 @@ <h2>Integration</h2>
           <pre><code>&lt;script src="https://osa-demo.pages.dev/osa-chat-widget.js"&gt;&lt;/script&gt;
 &lt;script&gt;
   OSAChatWidget.setConfig({
-    communityId: 'hed'  // or 'bids', 'eeglab', etc.
+    communityId: '${exampleId}'  // ${otherIds ? `or ${otherIds}` : ''}
   });
 &lt;/script&gt;</code></pre>
           <p>The widget auto-configures the API endpoint based on the <code>communityId</code>.</p>
@@ -268,11 +281,12 @@ <h2>Integration</h2>
         <div class="info-box">
           <h2>Configuration Options</h2>
           <ul>
-            <li><code>communityId</code> - Which assistant to use (e.g., 'hed', 'bids')</li>
+            <li><code>communityId</code> - Which assistant to use ('${exampleId}'${otherIds ? `, ${otherIds}` : ''})</li>
             <li><code>title</code> - Widget header title</li>
             <li><code>placeholder</code> - Input placeholder text</li>
             <li><code>initialMessage</code> - First greeting message from the assistant</li>
             <li><code>suggestedQuestions</code> - Array of clickable suggestion buttons</li>
+            <li><code>widgetInstructions</code> - Per-page context instructions for the assistant</li>
             <li><code>apiEndpoint</code> - Backend API URL (auto-detected by default)</li>
             <li><code>storageKey</code> - localStorage key (auto-derived from communityId)</li>
             <li><code>turnstileSiteKey</code> - Cloudflare Turnstile site key (optional)</li>
@@ -344,7 +358,9 @@ <h2>Add to Your Site</h2>
           <pre><code>&lt;script src="https://osa-demo.pages.dev/osa-chat-widget.js"&gt;&lt;/script&gt;
 &lt;script&gt;
   OSAChatWidget.setConfig({
-    communityId: '${communityId}'
+    communityId: '${communityId}',
+    // Optional: provide page-specific context
+    // widgetInstructions: 'Focus on topics relevant to this page.'
   });
 &lt;/script&gt;</code></pre>
         </div>
@@ -388,13 +404,8 @@ <h2>Learn More</h2>
       OSAChatWidget.setConfig(community.widget);
     }
 
-    // Route based on URL path
-    const communityId = getCommunityFromPath();
-    if (communityId) {
-      renderCommunity(communityId);
-    } else {
-      renderLanding();
-    }
+    // Load communities from API and then route
+    loadCommunities();
   </script>
 </body>
 </html>