Add model-selection strategies and local BioModels batch mode by nprzrosas · Pull Request #2 · sciluna/data2sbml

nprzrosas · 2026-06-10T01:05:52Z

Summary

This PR makes it easier to evaluate data2sbml on curated BioModels and adds more flexible ways to choose the final
inferred model.

There are two main improvements:

the pipeline can now run directly on a local BioModels directory, including models that do not have a usable
.sedml file
final model selection is no longer limited to local per-species PySR loss; it can also use global simulation-based
criteria

What changed

Added new model-selection strategies in data2sbml.py:
- pysr_model_selection
- global_rmse
- global_multiobjective
Added CLI options to control shortlist size, evaluation budget, and complexity penalty.
Updated biomodels_batch.py so it can run directly on a local curated BioModels directory with --local-models- dir.
Added fallback simulation settings for models without usable SED-ML:
- --fallback-start
- --fallback-duration
- --fallback-points
Added --require-sedml for cases where strict SED-ML-only behavior is preferred.
Expanded biomodels_batch_summary.tsv with simulation provenance and richer per-model metrics.
Added tests and updated the README.

Why this is useful

Previously, batch evaluation worked best when a model had both SBML and a usable SED-ML time-course definition.
In practice, the curated local BioModels set is not that uniform, so some models were hard to evaluate without
manual handling.

This PR makes the workflow more practical for benchmarking:

models can be loaded directly from the local curated directory
missing or unusable SED-ML no longer blocks evaluation
model selection can be based on reconstructed trajectory quality, not only local symbolic-regression fit

Validation

pytest tests passes locally (20 passed)
Ran a full end-to-end test on BIOMD0000000001
Result:
- status = success
- rmse_mean = 9.036302267035591e-21

Notes

MANUAL INSPECTION OF THE SIMULATED PLOTS COMPARED WITH THE ORIGINAL SIMULATIONS FOR BIOMD0000000001 INDICATES THAT THE SIMULATED SIGNALS DO NOT REPRODUCE THE ORIGINAL TRAJECTORIES. AS A RESULT, ADDITIONAL WORK IS NEEDED TO IMPROVE THIS PIPELINE.

This should still be treated as an evaluation-focused improvement, not a claim that the full BioModels set has been validated.

The new global selection strategies are in place, but they still need broader benchmarking across more models.

Add model-selection strategies and local BioModels batch mode

dc73009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model-selection strategies and local BioModels batch mode#2

Add model-selection strategies and local BioModels batch mode#2
nprzrosas wants to merge 1 commit into
mainfrom
feature/biomodels-local-batch

nprzrosas commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nprzrosas commented Jun 10, 2026

Summary

What changed

Why this is useful

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant