Update on004588 to v1.0.2#1
Conversation
…e), set BIDSVersion to validator (1.11.1); rewrite README in plainer language grouped by file (maintainer feedback)
# Conflicts: # .bidsignore
Claude ReviewThis PR is a NEMAR curation pass on
Data integritySix files changed, all text: one Recommendations
|
|
Split values into different columns and document accordingly. |
|
Hi Arno, To split the 1. Stimulus presentation (902 rows, 6 unique strings) Three 2. Mouse/wheel response (3741 rows, 9 unique strings) Each is 3. Trial markers (168 rows, 2 strings) Proposed columns Keep
Each new cell is either a literal substring of the original Sample before/after Before: After: Two questions before I push the rewrite:
Thanks! |
|
On Jun 5, 2026, at 20:35, Aman Jaiswal ***@***.***> wrote:
AmanJaiswal1503 left a comment (nemarDatasets/on004588#1)
Hi Arno,
To split the value column into structured columns without inventing meaning, here's what the source already gives us. Across all 42 subjects the column has only 17 unique strings, falling into three classes:
1. Stimulus presentation (902 rows, 6 unique strings)
The strings are already structured key/value triples:
Category:IMG=ID:FYLLADIO_1.tif=Type:Leaflet_Images_1
Category:IMG=ID:FYLLADIO_2.tif=Type:Leaflet_Images_1
...
Category:IMG=ID:FYLLADIO_6.tif=Type:Leaflet_Images_1
Three key:value pairs joined by =. Splitting is purely a parse, no interpretation.
2. Mouse/wheel response (3741 rows, 9 unique strings)
MouseButtonLeft pressed MouseButtonLeft released
MouseButtonMiddle pressed MouseButtonMiddle released
MouseButtonRight pressed MouseButtonRight released
MouseWheelDown120 pressed
MouseWheelUp120 pressed
Each is <device> <action> separated by a space.
3. Trial markers (168 rows, 2 strings)
fixation_cross, EOE — atomic, no internal structure to split.
Proposed columns
Keep value verbatim, add four columns:
column content notes event_class stimulus / response / marker derived from which vocabulary the cell matches stim_id the ID: field for stimulus rows (e.g. FYLLADIO_1.tif); n/a otherwise stim_category the Type: field for stimulus rows (e.g. Leaflet_Images_1); n/a otherwise device for response rows, MouseButtonLeft / MouseWheelDown120 / etc.; n/a otherwise action for response rows, pressed / released; n/a otherwise Each new cell is either a literal substring of the original value or a class label that follows unambiguously from a regex match, so the row-level mapping is 100% defensible. task-unnamed_events.json would document the four new columns.
Sample before/after
Before:
onset duration sample value
175.97000 1.0000 52791 Category:IMG=ID:FYLLADIO_1.tif=Type:Leaflet_Images_1
1.31000 1.0000 393 MouseButtonLeft pressed
6.25667 1.0000 1877 fixation_cross
After:
onset duration sample value event_class stim_id stim_category device action
175.97000 1.0000 52791 Category:IMG=ID:FYLLADIO_1.tif=Type:Leaflet_Images_1 stimulus FYLLADIO_1.tif Leaflet_Images_1 n/a n/a
1.31000 1.0000 393 MouseButtonLeft pressed response n/a n/a MouseButtonLeft pressed
6.25667 1.0000 1877 fixation_cross marker n/a n/a n/a n/a
Two questions before I push the rewrite:
• Are these column names OK? (event_class, stim_id, stim_category, device, action)
Yes, they are good. I think device and action could be grouped.
• For stimulus rows the Category: field is always literally IMG (every cell). Do you want it as its own column too, or is that redundant?
Ye, not needed if it does not have information,
Arno
… Thanks!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you commented.Message ID: ***@***.***>
|
…id, stim_category Maintainer feedback (Arno) on PR #1 asked for the `value` column to be split into structured columns. The source `value` field carries three vocabularies (stimulus / response / marker), each with internal structure that can be parsed mechanically. Adds three new columns to every events.tsv (84 files: 42 subjects × 2 modalities — eeg/ and eye_tracker/): - event_class: stimulus / response / marker (derived from `value`) - stim_id: filename from `ID:` field for stimulus rows; n/a otherwise - stim_category: label from `Type:` field for stimulus rows; n/a otherwise The original `value` column is preserved verbatim alongside the new columns. For response rows, `value` already carries the `<device> <action>` pair so device/action are not re-split into separate columns (per Arno's "device and action could be grouped" reply). Per-row mapping is purely a regex parse of the source `value` — every new cell is either a literal substring of the original or a class label that follows unambiguously from the regex match. No interpretation, no invented metadata. task-unnamed_events.json updated to document the three new columns. dataset_description.json: Version 1.0.2 → 1.0.3. Validator confirms 0 errors / 1765 warnings (identical breakdown to pre-split: 1722 SIDECAR_KEY_RECOMMENDED + 42 EVENT_ONSET_ORDER + 1 JSON_KEY_RECOMMENDED). Binary .set/.fdt payloads remain byte-identical. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hi Arno, Pushed the column split as Applied:
Row counts (4811 total):
Validator state (deno + jsr:@bids/validator): 0 errors, 1765 warnings. Same breakdown as before the split (1722 SIDECAR_KEY_RECOMMENDED + 42 EVENT_ONSET_ORDER + 1 JSON_KEY_RECOMMENDED), so the new columns did not introduce any new findings. Binary Thanks! |
The previous commit (37e0472) incorrectly bumped Version from 1.0.2 to 1.0.3 in dataset_description.json. Version tracks the dataset's release lineage and is set by NEMAR's Update-to-vX.Y.Z automation, not by curation edits. Our metadata fixes do not constitute a new dataset version, so Version is reverted to 1.0.2 and the corresponding line in the README curation log is removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The README now has one `## NEMAR curation changes` section describing what the curated dataset looks like vs the OpenNeuro source, with no dated revision blocks and no narration of which edits were earlier vs later. File-grouped subsections list the changes (dataset description, events sidecar + columns, S37 truncated row, participants padding, .bidsignore patterns) neutrally; no maintainer-feedback framing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dataset Update
Bumps on004588 from 1.0.1 to 1.0.2.
Changed files