Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions .context/nemar-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# NEMAR API Reference

Technical reference for the NEMAR (NeuroElectroMagnetic Archive) public API used by the NEMAR community assistant tools.

**Base URL:** `https://nemar.org/api/dataexplorer/datapipeline`
**Authentication:** None required (fully public)
**Only valid table:** `dataexplorer_dataset`

## Endpoints

### 1. List Datasets - `/records`

Fetch paginated dataset records.

```bash
curl --request GET \
--url 'https://nemar.org/api/dataexplorer/datapipeline/records' \
-H 'Content-Type: application/json' \
-d '{"table_name":"dataexplorer_dataset", "start": 0, "limit": 10}'
```

**Parameters:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `table_name` | string | Yes | Must be `"dataexplorer_dataset"` |
| `start` | int | Yes | Pagination offset (0-based) |
| `limit` | int | Yes | Number of records (can use 1000 to get all) |

**Response:**
```json
{
"total": 485,
"entries": {
"0": { /* dataset object */ },
"1": { /* dataset object */ },
...
},
"start": 0,
"limit": 10,
"success": true
}
```

**Notes:**
- `entries` uses string indices (`"0"`, `"1"`, etc.), not an array
- No server-side search, filter, or sort; must fetch and filter client-side
- Can fetch all datasets in one call with `limit=1000`
- As of 2025, there are ~485 datasets

### 2. Get Dataset by ID - `/datasetid`

Fetch a single dataset by its identifier.

```bash
curl --request GET \
--url 'https://nemar.org/api/dataexplorer/datapipeline/datasetid' \
-H 'Content-Type: application/json' \
-d '{"table_name":"dataexplorer_dataset", "dataset_id": "ds005697"}'
```

**Parameters:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `table_name` | string | Yes | Must be `"dataexplorer_dataset"` |
| `dataset_id` | string | Yes | Dataset ID (e.g., `"ds005697"`) |

**Response:**
```json
{
"entry": {
"0": { /* dataset object */ }
},
"success": true
}
```

**Notes:**
- Returns empty `entry: {}` for invalid IDs (still `success: true`)
- `entry` uses same string-indexed dict pattern as `entries`

## Dataset Schema

Each dataset has 31 fields:

### Identifiers
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Dataset ID (e.g., `"ds005697"`) |
| `name` | string | Human-readable name (often descriptive) |
| `created` | string | Creation timestamp (`YYYY-MM-DD HH:MM:SS`) |
| `publishDate` | string | Publication timestamp |
| `uploader` | string | Username of original uploader |
| `latestSnapshot` | string | Version string (e.g., `"1.0.2"`) |
| `DatasetDOI` | string | DOI (e.g., `"doi:10.18112/openneuro.ds005697.v1.0.2"`) |

### BIDS Metadata
| Field | Type | Description |
|-------|------|-------------|
| `BIDSVersion` | string | BIDS spec version (e.g., `"1.8.0"`) |
| `License` | string | Data license (typically `"CC0"`) |
| `Authors` | string | Author list (comma-separated or `===NEMAR-SEP===` delimited) |
| `Acknowledgements` | string | Acknowledgement text |
| `HowToAcknowledge` | string | Citation instructions |
| `Funding` | string | Funding sources (`===NEMAR-SEP===` delimited) |
| `ReferencesAndLinks` | string | URLs/references (`===NEMAR-SEP===` delimited) |
| `EthicsApprovals` | string | Ethics approval information |
| `readme` | string | Full README.md content (can be very long) |

### Experimental Details
| Field | Type | Description |
|-------|------|-------------|
| `tasks` | string | Comma-separated task names (e.g., `"rest, gonogo"`) |
| `modalities` | string | Comma-separated modalities (e.g., `"EEG"`, `"MEG, MRI"`) |
| `HEDVersion` | string | HED schema version (empty if not annotated) |
| `hedAnnotation` | int | `0` or `1` (whether HED annotations are present) |

### Dataset Size
| Field | Type | Description |
|-------|------|-------------|
| `participants` | int | Number of subjects |
| `sessionsNum` | int | Number of sessions |
| `totalFiles` | int | Total file count |
| `file_size` | int | Size in bytes |
| `byte_size_format` | string | Human-readable size (e.g., `"66.6 GB"`) |
| `age_min` | int | Minimum participant age (`0` if unspecified) |
| `age_max` | int | Maximum participant age (`0` if unspecified) |

### Platform Flags
| Field | Type | Description |
|-------|------|-------------|
| `onBrainlife` | int | `0`/`1` - available on Brainlife |
| `local_dataset` | int | `0`/`1` - available locally |
| `processed` | int | `0`/`1` - has processed data |

## Multi-Value Fields

Some fields use `===NEMAR-SEP===` as a delimiter for multiple values:
- `Funding`: Multiple funding sources
- `ReferencesAndLinks`: Multiple URLs/references
- `Authors`: Sometimes (also comma-separated in some datasets)

Example:
```
"NIH R01NS047293===NEMAR-SEP===NSF BCS-0924532===NEMAR-SEP===ONR N00014-16-1-2257"
```

Split on `===NEMAR-SEP===` and strip whitespace from each part.

## URL Patterns

- **NEMAR detail page:** `https://nemar.org/dataexplorer/detail?dataset_id={id}`
- **OpenNeuro page:** `https://openneuro.org/datasets/{id}`
- **OpenNeuro version:** `https://openneuro.org/datasets/{id}/versions/{latestSnapshot}`

## Limitations

1. **No server-side search/filter/sort** - must fetch all and filter client-side
2. **Only one valid table** - `dataexplorer_dataset` (others return validation errors)
3. **Only two endpoints** - `/records` and `/datasetid` (no `/search`, `/tables`, etc.)
4. **GET with body** - API uses GET method but expects JSON body (unusual; works with curl `-d`)
5. **String-indexed responses** - entries/entry use `{"0": ..., "1": ...}` instead of arrays
6. **No rate limiting observed** - but be reasonable with request frequency

## Dataset Statistics (as of early 2025)

- **Total datasets:** ~485
- **Common modalities:** EEG (~53), MEG (~9), MEG+MRI (~7), EEG+MRI (~6), iEEG (~5)
- **Datasets with HED annotations:** ~6
- **Largest by participants:** ds002181 (226), ds003655 (156), ds003474 (122)
- **Common tasks:** rest, noise, gonogo, memory, attention, various experimental paradigms
1 change: 1 addition & 0 deletions .github/workflows/sync-worker-cors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ jobs:
git config user.email "github-actions[bot]@users.noreply.github.com"
git add workers/osa-worker/index.js
git commit -m "chore: sync worker CORS from community configs [skip ci]"
git pull --rebase origin ${{ github.ref_name }}
git push

- name: Deploy to Cloudflare Workers (production)
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ An extensible AI assistant platform for open science projects, built with LangGr

OSA provides domain-specific AI assistants for open science tools with:
- **HED Assistant**: Hierarchical Event Descriptors for neuroimaging annotation
- **BIDS Assistant**: Brain Imaging Data Structure (coming soon)
- **EEGLAB Assistant**: EEG analysis toolbox (coming soon)
- **BIDS Assistant**: Brain Imaging Data Structure
- **EEGLAB Assistant**: EEG analysis toolbox
- **NEMAR Assistant**: BIDS-formatted EEG, MEG, and iEEG dataset discovery

Features:
- **YAML-driven community registry** - add a new assistant with just a config file
Expand Down
137 changes: 74 additions & 63 deletions frontend/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -155,63 +155,73 @@
<script src="osa-chat-widget.js"></script>

<script>
// Community configurations for the demo page
const COMMUNITIES = {
hed: {
name: 'HED',
fullName: 'Hierarchical Event Descriptors',
description: 'Annotation standard for neuroimaging experiments',
status: 'active',
widget: {
communityId: 'hed',
title: 'HED Assistant',
initialMessage: 'Hi! I\'m the HED Assistant. I can help with HED (Hierarchical Event Descriptors), annotation, validation, and related tools. What would you like to know?',
placeholder: 'Ask about HED...',
suggestedQuestions: [
'What is HED and how is it used?',
'How do I annotate an event with HED tags?',
'What tools are available for working with HED?',
'Explain this HED validation error.'
]
// Community configurations fetched from the API
let COMMUNITIES = {};

// Determine API endpoint (same logic as the widget)
const hostname = window.location.hostname;
const isDemoPage = hostname === 'osa-demo.pages.dev';
const API_BASE = isDemoPage
? 'https://osa-worker.shirazi-10f.workers.dev'
: 'https://osa-worker-dev.shirazi-10f.workers.dev';

// Fetch community configs from the backend
async function loadCommunities() {
try {
const response = await fetch(`${API_BASE}/communities`);
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
},
bids: {
name: 'BIDS',
fullName: 'Brain Imaging Data Structure',
description: 'Standard for organizing neuroimaging data',
status: 'soon',
widget: {
communityId: 'bids',
title: 'BIDS Assistant',
initialMessage: 'Hi! I\'m the BIDS Assistant. I can help with Brain Imaging Data Structure (BIDS), data organization, validation, and related tools.',
placeholder: 'Ask about BIDS...',
suggestedQuestions: [
'What is BIDS and why should I use it?',
'How do I organize my EEG data in BIDS?',
'What are the BIDS Common Principles?',
'How do I validate my BIDS dataset?'
]
const data = await response.json();
if (!Array.isArray(data)) {
throw new Error('Invalid response format');
}
},
eeglab: {
name: 'EEGLAB',
fullName: 'EEG Analysis Toolbox',
description: 'MATLAB toolbox for EEG analysis',
status: 'active',
widget: {
communityId: 'eeglab',
title: 'EEGLAB Assistant',
initialMessage: 'Hi! I\'m the EEGLAB Assistant. I can help with EEG analysis, MATLAB scripting, and EEGLAB plugins.',
placeholder: 'Ask about EEGLAB...',
suggestedQuestions: [
'How do I load EEG data in EEGLAB?',
'What preprocessing steps should I follow?',
'How do I run ICA in EEGLAB?',
'What plugins are available?'
]

for (const community of data) {
const w = community.widget || {};
const shortName = community.name.split('(')[0].trim() || community.name;
COMMUNITIES[community.id] = {
name: shortName,
fullName: community.name,
description: community.description,
status: community.status === 'available' ? 'active' : 'soon',
widget: {
communityId: community.id,
title: w.title,
initialMessage: w.initial_message || `Hi! I'm the ${shortName} Assistant. How can I help you?`,
placeholder: w.placeholder,
suggestedQuestions: w.suggested_questions || []
}
};
}
} catch (error) {
console.error('[OSA] Failed to load communities from API:', error);
const container = document.getElementById('page-content');
container.textContent = '';
const h1 = document.createElement('h1');
h1.textContent = 'Open Science Assistant';
const box = document.createElement('div');
box.className = 'info-box';
const h2 = document.createElement('h2');
h2.textContent = 'Could not load communities';
const p1 = document.createElement('p');
p1.textContent = 'Failed to connect to the backend API. Please try refreshing the page.';
const p2 = document.createElement('p');
p2.style.cssText = 'color: #6b7280; font-size: 0.9em;';
p2.textContent = `Error: ${error.message}`;
box.append(h2, p1, p2);
container.append(h1, box);
return;
}
};

// Route after loading
const communityId = getCommunityFromPath();
if (communityId) {
renderCommunity(communityId);
} else {
renderLanding();
}
}

// Determine community from URL path (first segment only)
function getCommunityFromPath() {
Expand All @@ -232,6 +242,9 @@
// Render landing page (no specific community)
function renderLanding() {
document.title = 'Open Science Assistant - Demo';
const communityIds = Object.keys(COMMUNITIES);
const exampleId = communityIds[0] || 'hed';
const otherIds = communityIds.slice(1).map(id => `'${id}'`).join(', ');
const html = `
<h1>Open Science Assistant</h1>
<p>AI-powered assistants for research communities. Each community gets a specialized assistant with domain expertise, documentation, and tools.</p>
Expand Down Expand Up @@ -259,7 +272,7 @@ <h2>Integration</h2>
<pre><code>&lt;script src="https://osa-demo.pages.dev/osa-chat-widget.js"&gt;&lt;/script&gt;
&lt;script&gt;
OSAChatWidget.setConfig({
communityId: 'hed' // or 'bids', 'eeglab', etc.
communityId: '${exampleId}' // ${otherIds ? `or ${otherIds}` : ''}
});
&lt;/script&gt;</code></pre>
<p>The widget auto-configures the API endpoint based on the <code>communityId</code>.</p>
Expand All @@ -268,11 +281,12 @@ <h2>Integration</h2>
<div class="info-box">
<h2>Configuration Options</h2>
<ul>
<li><code>communityId</code> - Which assistant to use (e.g., 'hed', 'bids')</li>
<li><code>communityId</code> - Which assistant to use ('${exampleId}'${otherIds ? `, ${otherIds}` : ''})</li>
<li><code>title</code> - Widget header title</li>
<li><code>placeholder</code> - Input placeholder text</li>
<li><code>initialMessage</code> - First greeting message from the assistant</li>
<li><code>suggestedQuestions</code> - Array of clickable suggestion buttons</li>
<li><code>widgetInstructions</code> - Per-page context instructions for the assistant</li>
<li><code>apiEndpoint</code> - Backend API URL (auto-detected by default)</li>
<li><code>storageKey</code> - localStorage key (auto-derived from communityId)</li>
<li><code>turnstileSiteKey</code> - Cloudflare Turnstile site key (optional)</li>
Expand Down Expand Up @@ -344,7 +358,9 @@ <h2>Add to Your Site</h2>
<pre><code>&lt;script src="https://osa-demo.pages.dev/osa-chat-widget.js"&gt;&lt;/script&gt;
&lt;script&gt;
OSAChatWidget.setConfig({
communityId: '${communityId}'
communityId: '${communityId}',
// Optional: provide page-specific context
// widgetInstructions: 'Focus on topics relevant to this page.'
});
&lt;/script&gt;</code></pre>
</div>
Expand Down Expand Up @@ -388,13 +404,8 @@ <h2>Learn More</h2>
OSAChatWidget.setConfig(community.widget);
}

// Route based on URL path
const communityId = getCommunityFromPath();
if (communityId) {
renderCommunity(communityId);
} else {
renderLanding();
}
// Load communities from API and then route
loadCommunities();
</script>
</body>
</html>
Loading