Skip to content

Add per-dataset size to StudyInfo and the dataset explorer table #144

Description

@bkowshik

Description

Problem

The Explore Brain Datasets table shows modality, subjects, timelines, and hours — but not dataset size. There's no way to tell a laptop-friendly dataset (~1.5 GB MNE sample) from a multi-hundred-GB one (THINGS-MEG is 377 GB on OpenNeuro) before starting a download. Size today exists only as prose in a few docstrings ("~12GB") and as the aggregate benchmark figure (~3.3 TB).

Proposal

  1. Data model — add a field to StudyInfo (neuralset/events/study.py):

    size_bytes: int | None = None  # on-disk size after extraction; None = not yet measured

    Optional with a None default, so studies can be populated incrementally and existing _info definitions are untouched. Stored as bytes; formatted to GB only at render time.

  2. Per-study values — populate size_bytes in each study's _info in neuralfetch/studies/*.py.

  3. Docs — render a "Size" column in docs/scripts/build_study_explorer.py: add to _SUMMARY_COLUMNS, read in get_summaries(), add header + body cell, and expose it as a sort/filter key in the explorer JS alongside Subjects/Hours.

Open questions

  • How should the sizes be measured? Would appreciate the maintainers' guidance here. Some options: query host APIs where available (OpenNeuro GraphQL snapshot { size }, DANDI total_size, Figshare/Zenodo file listings); or run each study's _download() and measure with du — perhaps you already have these numbers from running the full benchmark?
  • On-disk vs compressed download size — proposing on-disk post-extraction (matches the existing ~3.3 TB aggregate, and is what users need to provision); could store both if useful.

Happy to submit a PR for this.

Alternatives considered

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions