Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,26 @@ uv run python data2sbml.py \
--maxsize 24
```

### Model-selection strategies

The final inferred model can now be selected with three strategies ordered from
cheapest to most expensive:

- `--selection-strategy pysr_model_selection`: choose one safe equation per species using PySR's local `model_selection` rule.
- `--selection-strategy global_rmse`: shortlist per-species candidates, reconstruct full models, and choose the model with the lowest simulated mean RMSE.
- `--selection-strategy global_multiobjective`: search full reconstructed models and choose the model with the lowest `rmse_mean + penalty * total_complexity`.

Example:

```bash
uv run python data2sbml.py \
--sbml demo_polynomial_2d.xml \
--species x y \
--selection-strategy global_rmse \
--selection-top-k 3 \
--selection-max-model-evals 64
```

### Harder example

The BioModels file `BIOMD0000000346_url.xml` is also runnable with the same pipeline, but it is a harder symbolic-regression problem and should be treated as a stress test rather than the baseline validation case.
Expand All @@ -52,11 +72,34 @@ To fetch curated BioModels identifiers via `biocompose`, cache their SBML/SED-ML
uv run python biomodels_batch.py --limit 25 --max-species 6
```

To run the batch directly over a local directory of downloaded BioModels files such as `artifacts/all_curated_biomodels`:

```bash
uv run python biomodels_batch.py \
--local-models-dir artifacts/all_curated_biomodels \
--limit 25 \
--max-species 6
```

When a model has no `.sedml` file, or the SED-ML does not expose a usable `UniformTimeCourse`, the batch now falls back to configurable simulation settings instead of skipping the model:

```bash
uv run python biomodels_batch.py \
--local-models-dir artifacts/all_curated_biomodels \
--fallback-start 0 \
--fallback-duration 20 \
--fallback-points 801
```

Use `--require-sedml` if you want to keep the previous strict behavior and fail on models without a usable SED-ML file.

The batch summary is written to `artifacts/biomodels_batch_summary.tsv` and now includes:

- `original_model_equation_count`
- `inferred_model_pysr_complexity`
- `inferred_model_node_count`
- `simulation_settings_source`
- `simulation_settings_note`

To only cache all curated BioModels files without running symbolic regression:

Expand Down
Loading