Description
Problem
The Explore Brain Datasets table shows modality, subjects, timelines, and hours — but not dataset size. There's no way to tell a laptop-friendly dataset (~1.5 GB MNE sample) from a multi-hundred-GB one (THINGS-MEG is 377 GB on OpenNeuro) before starting a download. Size today exists only as prose in a few docstrings ("~12GB") and as the aggregate benchmark figure (~3.3 TB).
Proposal
-
Data model — add a field to StudyInfo (neuralset/events/study.py):
size_bytes: int | None = None # on-disk size after extraction; None = not yet measured
Optional with a None default, so studies can be populated incrementally and existing _info definitions are untouched. Stored as bytes; formatted to GB only at render time.
-
Per-study values — populate size_bytes in each study's _info in neuralfetch/studies/*.py.
-
Docs — render a "Size" column in docs/scripts/build_study_explorer.py: add to _SUMMARY_COLUMNS, read in get_summaries(), add header + body cell, and expose it as a sort/filter key in the explorer JS alongside Subjects/Hours.
Open questions
- How should the sizes be measured? Would appreciate the maintainers' guidance here. Some options: query host APIs where available (OpenNeuro GraphQL
snapshot { size }, DANDI total_size, Figshare/Zenodo file listings); or run each study's _download() and measure with du — perhaps you already have these numbers from running the full benchmark?
- On-disk vs compressed download size — proposing on-disk post-extraction (matches the existing ~3.3 TB aggregate, and is what users need to provision); could store both if useful.
Happy to submit a PR for this.
Alternatives considered
No response
Description
Problem
The Explore Brain Datasets table shows modality, subjects, timelines, and hours — but not dataset size. There's no way to tell a laptop-friendly dataset (~1.5 GB MNE sample) from a multi-hundred-GB one (THINGS-MEG is 377 GB on OpenNeuro) before starting a download. Size today exists only as prose in a few docstrings ("~12GB") and as the aggregate benchmark figure (~3.3 TB).
Proposal
Data model — add a field to
StudyInfo(neuralset/events/study.py):Optional with a
Nonedefault, so studies can be populated incrementally and existing_infodefinitions are untouched. Stored as bytes; formatted to GB only at render time.Per-study values — populate
size_bytesin each study's_infoinneuralfetch/studies/*.py.Docs — render a "Size" column in
docs/scripts/build_study_explorer.py: add to_SUMMARY_COLUMNS, read inget_summaries(), add header + body cell, and expose it as a sort/filter key in the explorer JS alongside Subjects/Hours.Open questions
snapshot { size }, DANDItotal_size, Figshare/Zenodo file listings); or run each study's_download()and measure withdu— perhaps you already have these numbers from running the full benchmark?Happy to submit a PR for this.
Alternatives considered
No response