Skip to content

Update on004022 to v1.0.2#1

Open
AmanJaiswal1503 wants to merge 4 commits into
mainfrom
update/on004022-mphbs2pu
Open

Update on004022 to v1.0.2#1
AmanJaiswal1503 wants to merge 4 commits into
mainfrom
update/on004022-mphbs2pu

Conversation

@AmanJaiswal1503
Copy link
Copy Markdown
Collaborator

Dataset Update

Bumps on004022 from 1.0.1 to 1.0.2.

Changed files

  • bidsignore
  • README.md
  • dataset_description.json
  • sub-01/eeg/sub-01_task-motorimagery_run-1_electrodes.tsv
  • sub-01/eeg/sub-01_task-motorimagery_run-2_electrodes.tsv
  • sub-01/eeg/sub-01_task-motorimagery_run-3_electrodes.tsv
  • sub-01/fnirs/sub-01_task-motorimagery_run-1_electrodes.tsv
  • sub-01/fnirs/sub-01_task-motorimagery_run-2_electrodes.tsv
  • sub-01/fnirs/sub-01_task-motorimagery_run-3_electrodes.tsv
  • sub-02/eeg/sub-02_task-motorimagery_run-1_electrodes.tsv
  • ... and 56 more

…e), set BIDSVersion to validator (1.11.1); rewrite README in plainer language grouped by file (maintainer feedback)
@AmanJaiswal1503
Copy link
Copy Markdown
Collaborator Author

Claude Review

This PR is a NEMAR curation pass on on004022, bumping the dataset Version from 1.0.0 to 1.0.2 and the spec target (BIDSVersion) from 1.0.2 to 1.11.1. The intent is a metadata-only cleanup that takes the validator from 28 errors + 1010 warnings to 0 errors + 1024 warnings without touching any of the .set/.fdt EEG or .mat fNIRS payloads. Walking through what it changes:

EEG electrodes.tsv (21 files collapsed to 7, one per subject). The source dataset stored an electrodes.tsv next to every recording, with names like sub-NNN_task-motorimagery_run-N_electrodes.tsv. The BIDS-EEG filename rules for electrodes.tsv do not allow task- or run- entities — the file lives at the subject (or session) level, not the recording level — so all 21 names were rejected by the validator. The PR renames the run-1 copy in each subject to sub-NNN_electrodes.tsv and deletes the run-2 and run-3 copies. I checked: on main, all 21 EEG electrodes.tsv files share the same MD5 (3959c37c…), and on the PR branch all 7 surviving files still have that exact MD5. So the consolidation is lossless at the byte level — every subject's electrode positions are preserved unchanged, and the dropped copies were literal duplicates.

fNIRS electrodes.tsv (21 files collapsed to 7, one per subject). Same consolidation pattern on the fNIRS side, for the same BIDS reason. I verified the same way: all 21 originals share MD5 265e604b…, and all 7 PR-branch survivors still match. No content was rewritten.

EEG coordsystem.json (7 new files, one per subject). Every electrodes.tsv needs a paired coordsystem.json in BIDS-EEG; none existed on main. The PR adds one identical file per subject with EEGCoordinateSystem = "Other", EEGCoordinateUnits = "m", and an EEGCoordinateSystemDescription that explicitly states the coordinate system is not documented upstream and that the paired electrodes.tsv was preserved unchanged. EEGCoordinateSystemDescription is required whenever EEGCoordinateSystem is "Other" per the BIDS-EEG schema, so filling it (rather than leaving it blank) is the correct way to satisfy that conditional dependency, and the wording is honest about the limitation. All seven files are byte-identical to each other (0cc89896…), which is consistent with the source not documenting per-subject differences.

.bidsignore. The file on main had CRLF line endings and a missing newline between two entries, which collapsed task_motorimagery_eeg.json and .nemar/ into a single broken pattern (/task_motorimagery_eeg.json.nemar/) that matched nothing. The PR rewrites it with LF endings and separated patterns: */fnirs, */fnirs/, */fnirs/**, task_motorimagery_nirs.json, task_motorimagery_eeg.json, .nemar/. The three */fnirs patterns plus the nirs.json line are what the README is describing — they hide the fNIRS modality from the validator because the fNIRS data is .mat rather than the .snirf BIDS-fNIRS expects, so the files would otherwise generate one "file not included in schema" error per subject. Ignoring .nemar/ is a NEMAR housekeeping pattern and was clearly the original author's intent.

dataset_description.json. BIDSVersion is bumped 1.0.21.11.1 (matching the current validator), and Version is bumped 1.0.01.0.2. Three empty optional fields were also removed: "Acknowledgements": "", "HowToAcknowledge": "", and "Funding": ["", "", ""]. Removing empty recommended fields is reasonable — they were generating "recommended but missing" warnings either way — and the remaining content (Name, Authors, License, DatasetDOI, SourceDatasets, DatasetType) is preserved verbatim. Both the dataset Version bump and the removal of those three empty fields are silent in the curation log; the README only mentions the BIDSVersion change.

README.md. Adds the "NEMAR curation changes" section documenting (most of) the above.

Data integrity

52 files in the diff: 42 .tsv, 8 .json, 1 .md, 1 .bidsignore. Zero binary EEG or fNIRS payloads touched — I spot-checked the git-annex pointer files for .set, .fdt, and .mat on both branches and the MD5-keyed annex paths are identical, so the signal data is provably untouched. No _events.tsv, _channels.tsv, _scans.tsv, _eeg.json, or _nirs.json files were modified. The 28 electrodes.tsv files that were renamed/deleted preserve their original byte content in the 14 survivors (verified by MD5 against main). No _scans.tsv files exist in this dataset, so the acq_time Z-suffix question doesn't arise.

Recommendations

  • The dataset Version bump (1.0.01.0.2) and the removal of the empty Acknowledgements / HowToAcknowledge / Funding fields are not mentioned in the README curation log. Worth adding a one-line note to keep the log complete — both changes look fine on their merits, just undocumented.
  • Re-examine whether task_motorimagery_eeg.json belongs in .bidsignore. It looks like a valid BIDS-EEG study-level inherited sidecar (TaskName, EEGSamplingFrequency, EEGReference, etc.) and was only being "ignored" on main because of the broken-pattern typo. The PR explicitly listed it as a separate ignore entry, which probably preserved the original author's intent but may be hiding a perfectly valid file from the validator. Worth confirming that removing it from .bidsignore does not reintroduce a validator error before merging.
  • The fNIRS .mat.snirf conversion is explicitly out of scope here and the .bidsignore workaround is the right call for a metadata-only pass — leaving as a tracked follow-up rather than a blocker.

@arnodelorme
Copy link
Copy Markdown

This dataset contains no events which is wrong. The events are in the binary files and should be in the event file and be interpretable.

…ts tables

The source dataset shipped no events.tsv files even though the .set files
carry a populated EEG.event struct (243 events per recording across all 21
recordings). This commit extracts those events into BIDS-spec per-recording
events.tsv files and adds the dataset-level task-motorimagery_events.json
sidecar describing the columns.

events.tsv (21 new files, one per recording)
- Columns: onset (seconds from recording start), duration (n/a; the source
  stamps a 1-sample duration on every trigger which is not a meaningful
  duration), sample (sample index at sfreq=500 Hz), value (verbatim string
  from EEG.event).
- 243 events per recording, distribution uniform across all 21:
  1 boundary, 1 'S  1', 40 each of 'S  2'/'S  7'/'S  8'/'S  9'/'S 10',
  10 each of 'S  3'/'S  4'/'S  5'/'S  6', 1 'condition 12'.

task-motorimagery_events.json (new dataset-level sidecar)
- Declares the four columns (onset, duration, sample, value) with their
  BIDS types (Units: s on onset and duration).
- value column carries a Levels enum listing every observed marker. The
  source dataset does not publish a code-to-meaning mapping for the
  per-trial 'S N' codes; each Levels entry honestly records what is
  verifiable (which label, how often per recording, that the meaning is
  not documented in the source).
- The README's documented trial structure (3s fixation, 4s cue, 3s ready,
  5s imagery; 40 trials; four MI tasks Reach/Grasp/Lift/Twist) is
  referenced in the column description as experimental context.

README is updated to fold the extraction into the single curation section
(no separate revision block, neutral description). Validator confirms
0 errors / 1003 warnings (21 fewer than pre-extraction — the missing
events-table warnings have been resolved). Binary .set/.fdt payloads are
byte-identical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AmanJaiswal1503
Copy link
Copy Markdown
Collaborator Author

AmanJaiswal1503 commented Jun 5, 2026

Hi @arnodelorme, pushed 629b7a6 which extracts the events from each .set file's EEG.event struct into per-recording BIDS events.tsv files (21 new files, 243 events each), plus a dataset-level task-motorimagery_events.json declaring the four columns and listing every observed marker (boundary, S 1 to S 10, condition 12) in a Levels enum.

One caveat on interpretability: the source dataset does not publish a code-to-meaning mapping for the per-trial S N codes. The README documents the trial structure (3 s fixation, 4 s visual cue, 3 s ready, 5 s imagery; 40 trials per session; four MI tasks Reach / Grasp / Lift / Twist presented in random order) but does not state which S N code corresponds to which phase or task. I checked the OpenNeuro page, the NEMAR page, the BrainLife mirror, and the dataset's own CHANGES file (which only says it was uploaded for journal-paper submission in 2022-02); I also searched for an associated peer-reviewed paper by Seho Lee / Dong-Joo Kim and the other listed authors and could not find one. There is a structural coincidence worth flagging: S 3 / S 4 / S 5 / S 6 each appear exactly 10 times per session, matching the README's "4 MI tasks × 10 trials," and the onset intervals between S 2, S 3-6, S 7, S 8 align with the documented 3 s / 4 s / 3 s phase durations, so the inference is that S 2 is fixation onset, S 3-6 is cue onset (with the specific code = MI task identity), S 7 is ready onset, and S 8 is imagery onset. But none of that is published anywhere in the source dataset, so the Levels entries record only what is verifiable rather than committing to the inference.

Validator: 0 errors / 1003 warnings (21 fewer than before; the missing-events-table warnings are resolved). Ready for re-review.

Right now I have this non-interpretability documented in the root level events files, should i add this in .nemar/notes.json or having the current structure at the root level is enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants