Skip to content

Update on005628 to v1.0.2#1

Open
AmanJaiswal1503 wants to merge 2 commits into
mainfrom
update/on005628-mphbxosk
Open

Update on005628 to v1.0.2#1
AmanJaiswal1503 wants to merge 2 commits into
mainfrom
update/on005628-mphbxosk

Conversation

@AmanJaiswal1503

Copy link
Copy Markdown

Dataset Update

Bumps on005628 from 1.0.1 to 1.0.2.

Changed files

  • EADME.md
  • dataset_description.json
  • participants.json
  • sub-1/eeg/sub-1_task-Edzna_run-1_channels.tsv
  • sub-1/eeg/sub-1_task-Edzna_run-1_eeg.json
  • sub-1/eeg/sub-1_task-Edzna_run-2_channels.tsv
  • sub-1/eeg/sub-1_task-Edzna_run-2_eeg.json
  • sub-1/eeg/sub-1_task-Edzna_run-3_channels.tsv
  • sub-1/eeg/sub-1_task-Edzna_run-3_eeg.json
  • sub-10/eeg/sub-10_task-Edzna_run-1_channels.tsv
  • ... and 607 more

…matlab-tools entry), set BIDSVersion to validator (1.11.1); rewrite README in plainer language grouped by file (maintainer feedback)
@AmanJaiswal1503

Copy link
Copy Markdown
Author

Claude Review

This PR is a NEMAR curation pass on on005628, bumping the dataset to 1.0.2 and BIDSVersion to 1.11.1. Walking through what it changes:

task-Edzna_events.json (rewritten from [] to a proper object). The previous file shipped as a literal empty JSON array, which is the one hard validator error a BIDS sidecar can be — sidecars must be objects, not arrays. The replacement documents the two columns that actually appear in every populated _events.tsv (sample and value) and notes the rows are EEGLAB boundary annotations marking the joins between the concatenated baseline / visual / audiovisual phases. I checked every _events.tsv under sub-*p/eeg/ (153 files) and confirmed every row's value is boundary, so the description matches what's on disk and resolves the "additional column undefined" warnings on top of the error.

task-Edzna_eeg.json (new root sidecar). Pulls the fields that were identical across all 306 per-recording sidecars up to the dataset root, where BIDS' inheritance rule applies them to every matching recording. Adds Manufacturer: "g.tec medical engineering" and ManufacturersModelName: "Unicorn Hybrid Black" — the README explicitly names that hardware, and the existing cap-side keys already pointed at the same integrated unit, so these are just the matching amplifier-side BIDS keys for that one device. Sets EOGChannelCount, ECGChannelCount, EMGChannelCount, MISCChannelCount, and TriggerChannelCount to 0; I confirmed every _channels.tsv body has exactly 8 rows of scalp EEG and zero rows of any other type, so those counts are mechanically derivable. TaskDescription paraphrases the README without inventing detail. Note the canonical all-caps MISCChannelCount spelling is used (matches the BIDS-EEG schema).

Per-recording EEG sidecars (sub-*/eeg/sub-*_task-Edzna_run-*_eeg.json, 306 files) — slimmed to RecordingDuration only. Every other field these files used to carry is identical across the cohort and now lives in the root sidecar above. I diffed all 306 programmatically: each PR-side file contains exactly {"RecordingDuration": <n>} and that integer equals the value from the corresponding pre-PR file. No durations were altered.

Channel tables (sub-*/eeg/sub-*_task-Edzna_run-*_channels.tsv, 306 files). Every row's type flipped from n/a to EEG and units from n/a to uV; channel names left untouched. The paired sidecars declared EEGChannelCount: 8 while these rows had no type = EEG, so the validator couldn't see those 8 channels and warned on every file. I verified cell-by-cell across all 306 files that the only cells that changed are the type and units columns (header + 8 rows each), names are preserved exactly, and the only values written are EEG and uV. The Unicorn Hybrid Black records scalp EEG in microvolts and the existing SoftwareFilters already documents a 1–50 Hz bandpass, so the units assignment is consistent. The raw sub-N recordings use P3/P4 labels and the preprocessed sub-Np recordings use C3/C4 at those positions (153/153 split, confirmed by row inspection); the curation log correctly flags reconciling that relabel as a content question for the original authors rather than a mechanical fix.

participants.json (descriptions expanded). The previous one-liners ("Unique participant label", "Participant group label") didn't satisfy the recommended-description check. The rewrite makes two dataset-specific facts explicit — that subjects ending in p are preprocessed counterparts of the same numeric subject id, and that the Group column's per-group semantics are not documented in the source — without inventing meaning the source doesn't state. participants.tsv itself was not touched.

dataset_description.json. BIDSVersion moves 1.8.01.11.1 (validator version; BIDS version bumps are backward-compatible). DatasetType: "raw" is added — without it the validator falls back to derivative rules and emits a cascade of derivative warnings; this dataset is plainly raw (sub-*/eeg/ at top level, no derivatives/). ReferencesAndLinks: [""] becomes [] (the single empty-string element didn't satisfy the URL-array schema). The source's GeneratedBy entry for bids-matlab-tools 9.1 is kept verbatim. Version also moves 1.0.01.0.2; the README mentions the BIDSVersion bump but not the dataset Version bump — worth noting since the commit title is the only place it's recorded.

Undocumented infrastructure changes (separate from the BIDS curation). Five files outside the BIDS tree changed and aren't called out in the curation log: .github/workflows/bids-validation.yml rewritten to run the validator inline (instead of dispatching to a central workflow); three new workflows added (generate-archive.yml, llm-enrichment.yml, version-doi.yml); pr-merge.yml modified; .bidsignore drops the .nemar/ line (now appended at workflow runtime); and .nemar/metadata.json is restructured (pipeline_stage goes validatedseeded, the IsDerivedFrom relation becomes IsIdenticalTo, and source/source_id fields are added). These are NEMAR plumbing rather than BIDS curation, but they're part of the same PR and the README curation section doesn't mention them.

Data integrity

624 files changed, all .json / .tsv / .md / .yml / .bidsignore. Zero binary EEG payloads touched (.eeg, .vhdr, .vmrk, .set, .fdt, .edf, .bdf, .fif, .cnt — all 0). No _events.tsv, _scans.tsv, participants.tsv, electrodes, or coordsystem files were modified. The 153 .set files and their git-annex pointers are unchanged on both branches. Programmatic cell-by-cell diff across all 306 channel tables and all 306 per-recording sidecars confirmed no values were altered other than the documented type/units flips and the slim-down to RecordingDuration.

Recommendations

  • Mention the dataset Version bump (1.0.01.0.2) explicitly in the README curation log so the record is self-contained.
  • Consider noting the NEMAR infrastructure changes (workflow rewrite, .nemar/metadata.json restructure, .bidsignore line removal) in a separate "NEMAR infrastructure" subsection so future readers can tell BIDS curation from rehost plumbing at a glance.
  • The P3/P4 vs C3/C4 label discrepancy between raw and preprocessed counterparts is worth surfacing upstream to the original authors — it's a content question the curation can't resolve, but a note in the README or a dataset-level annotation would help downstream users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant