Skip to content

Add model-selection strategies and local BioModels batch mode#2

Open
nprzrosas wants to merge 1 commit into
mainfrom
feature/biomodels-local-batch
Open

Add model-selection strategies and local BioModels batch mode#2
nprzrosas wants to merge 1 commit into
mainfrom
feature/biomodels-local-batch

Conversation

@nprzrosas

Copy link
Copy Markdown
Collaborator

Summary

This PR makes it easier to evaluate data2sbml on curated BioModels and adds more flexible ways to choose the final
inferred model.

There are two main improvements:

  • the pipeline can now run directly on a local BioModels directory, including models that do not have a usable
    .sedml file
  • final model selection is no longer limited to local per-species PySR loss; it can also use global simulation-based
    criteria

What changed

  • Added new model-selection strategies in data2sbml.py:
    • pysr_model_selection
    • global_rmse
    • global_multiobjective
  • Added CLI options to control shortlist size, evaluation budget, and complexity penalty.
  • Updated biomodels_batch.py so it can run directly on a local curated BioModels directory with --local-models- dir.
  • Added fallback simulation settings for models without usable SED-ML:
    • --fallback-start
    • --fallback-duration
    • --fallback-points
  • Added --require-sedml for cases where strict SED-ML-only behavior is preferred.
  • Expanded biomodels_batch_summary.tsv with simulation provenance and richer per-model metrics.
  • Added tests and updated the README.

Why this is useful

Previously, batch evaluation worked best when a model had both SBML and a usable SED-ML time-course definition.
In practice, the curated local BioModels set is not that uniform, so some models were hard to evaluate without
manual handling.

This PR makes the workflow more practical for benchmarking:

  • models can be loaded directly from the local curated directory
  • missing or unusable SED-ML no longer blocks evaluation
  • model selection can be based on reconstructed trajectory quality, not only local symbolic-regression fit

Validation

  • pytest tests passes locally (20 passed)
  • Ran a full end-to-end test on BIOMD0000000001
  • Result:
    • status = success
    • rmse_mean = 9.036302267035591e-21

Notes

MANUAL INSPECTION OF THE SIMULATED PLOTS COMPARED WITH THE ORIGINAL SIMULATIONS FOR BIOMD0000000001 INDICATES THAT THE SIMULATED SIGNALS DO NOT REPRODUCE THE ORIGINAL TRAJECTORIES. AS A RESULT, ADDITIONAL WORK IS NEEDED TO IMPROVE THIS PIPELINE.

This should still be treated as an evaluation-focused improvement, not a claim that the full BioModels set has been validated.

The new global selection strategies are in place, but they still need broader benchmarking across more models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant