You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`--export_quant`| boolean |`false`| Enable `--export-quant` for fragment-level parquet data export. Requires DIA-NN >= 2.0. |
64
64
|`--site_ms1_quant`| boolean |`false`| Enable `--site-ms1-quant` to use MS1 apex intensities for PTM site quantification. Requires DIA-NN >= 2.0. |
65
+
|`--aa_eq`| boolean |`false`| Treat I&L, Q&E, N&D as equivalent amino acids during reannotation. Essential for entrapment FDR benchmarks. Maps to `--aa-eq`. |
66
+
67
+
### DIA-NN 2.5.0 flags (via `--extra_args`)
68
+
69
+
The following DIA-NN 2.5.0 flags are not exposed as pipeline parameters but can be passed via `--extra_args`. See [Fine-Tuning Deep Learning Models](usage.md#fine-tuning-deep-learning-models-dia-nn-20) for the complete workflow.
|`--tune-level <N>`| Limit fine-tuning to a specific model distillation level (0, 1, or 2). |
84
+
85
+
> **Note:**`--parent` and `--aa-eq` are blocked from `--extra_args` — they are managed as pipeline parameters (`--aa_eq`). `--parent` is container-managed and overriding it would break model discovery.
| 2.5.0 |`-profile diann_v2_5_0`| +70% protein IDs, model fine-tuning |
123
124
124
125
Example: `nextflow run bigbio/quantmsdiann -profile test_dia,docker,diann_v2_2_0`
125
126
@@ -318,12 +319,13 @@ process {
318
319
319
320
The pipeline supports multiple DIA-NN versions via built-in Nextflow profiles. Each profile sets `params.diann_version` and overrides the container image for all `diann`-labelled processes.
320
321
321
-
| Profile | DIA-NN Version | Container | Key features |
| `diann_v2_2_0` | 2.2.0 | `ghcr.io/bigbio/diann:2.2.0` | Speed optimizations (up to 1.6x on HPC). Parquet output. |
327
+
| `diann_v2_3_2` | 2.3.2 | `ghcr.io/bigbio/diann:2.3.2` | DDA support (`--dda`), InfinDIA, up to 9 variable mods. |
328
+
| `diann_v2_5_0` | 2.5.0 | `ghcr.io/bigbio/diann:2.5.0` | Up to 70% more protein IDs. DL model fine-tuning and selection. |
327
329
328
330
**Version-dependent features:** Some parameters are only available with newer DIA-NN versions. The pipeline handles version compatibility automatically:
329
331
@@ -348,6 +350,86 @@ nextflow run bigbio/quantmsdiann \
348
350
> [!NOTE]
349
351
> DIA-NN 2.x images are hosted on `ghcr.io/bigbio` and may require authentication for private registries. The `diann_v2_1_0` and `diann_v2_2_0` profiles force Docker mode by default; for Singularity, override with your own config.
350
352
353
+
## Fine-Tuning Deep Learning Models (DIA-NN 2.0+)
354
+
355
+
DIA-NN uses deep learning models to predict retention time (RT), ion mobility (IM), and fragment ion intensities. For non-standard modifications, fine-tuning these models on real data can substantially improve detection.
356
+
357
+
**When to fine-tune:** Fine-tuning is beneficial for custom chemical labels (e.g., mTRAQ, dimethyl), exotic PTMs, or unmodified cysteines. Standard modifications (Phospho, Oxidation, Acetylation, Deamidation, diGlycine) do not require fine-tuning — DIA-NN's built-in models already handle them well.
358
+
359
+
### How fine-tuning works
360
+
361
+
DIA-NN's neural networks encode each amino acid and modification as a "token" — an integer ID (0-255) mapped in a dictionary file (`dict.txt`). The default dictionary ships with DIA-NN and covers common modifications. When you fine-tune, DIA-NN:
362
+
363
+
1. Reads a spectral library containing empirically observed peptides with the modifications of interest
364
+
2. Learns how those modifications affect RT, IM, and fragmentation patterns
365
+
3. Outputs new model files (`.pt` PyTorch format) and an expanded dictionary (`dict.txt`) that includes tokens for the new modifications
366
+
367
+
The fine-tuned models are then used in place of the defaults when generating predicted spectral libraries.
368
+
369
+
> [!NOTE]
370
+
> **`--tune-lib` cannot be combined with `--gen-spec-lib` in a single DIA-NN invocation** ([confirmed in DIA-NN #1499](https://github.com/vdemichev/DiaNN/issues/1499)). Fine-tuning and library generation are separate DIA-NN commands. This means the workflow currently requires two pipeline runs.
371
+
372
+
### Current workflow (manual fine-tuning)
373
+
374
+
**Run 1 — Generate the tuning library:**
375
+
376
+
Run quantmsdiann normally. The empirical library produced by the ASSEMBLE_EMPIRICAL_LIBRARY step (after preliminary analysis) serves as the tuning library. This library contains empirically observed RT, IM, and fragment intensities for peptides bearing the modifications of interest.
377
+
378
+
```bash
379
+
# First run: standard pipeline to produce empirical library
The `--tokens`, `--rt-model`, and `--im-model` flags are passed to all DIA-NN steps via `--extra_args`, so the in-silico library generation uses the fine-tuned models to produce better-predicted spectra for the non-standard modifications.
417
+
418
+
> [!IMPORTANT]
419
+
> Use **absolute paths** for model files. The `--parent` flag is blocked by the pipeline (it controls the container's DIA-NN installation path).
420
+
421
+
### Future: integrated fine-tuning step
422
+
423
+
We are exploring adding an optional `FINE_TUNE_MODELS` step directly in the pipeline, which would eliminate the need for two separate runs. The integrated workflow would be:
This would be gated by a `--enable_fine_tuning` parameter. [@vdemichev](https://github.com/vdemichev): would this approach work correctly — using the empirical library from assembly as `--tune-lib`, then regenerating the in-silico library with the tuned models before proceeding to individual analysis? Or would you recommend a different integration point?
432
+
351
433
## Verbose Module Output
352
434
353
435
By default, only final result files are published. For debugging or detailed inspection, the `verbose_modules` profile publishes all intermediate files from every DIA-NN step:
0 commit comments