Skip to content

New release 2.0.0 including dda workflow for quantmsdiann#36

Open
ypriverol wants to merge 69 commits intomainfrom
dev
Open

New release 2.0.0 including dda workflow for quantmsdiann#36
ypriverol wants to merge 69 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 6, 2026

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the bigbio/quantmsdiann branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Summary by CodeRabbit

  • Documentation

    • Updated DOI/badges and many parameter names; added DDA mode, InfinDIA notes, new DIA‑NN options, convert_dotd and revised usage/parameters docs.
  • New Features

    • DDA analysis mode support, new DIA‑NN flags (scoring, channel normalization, InfinDIA, export options) and version-aware behavior; new test profiles.
  • Bug Fixes

    • Improved error propagation and safer logging to surface failures earlier.
  • Chores

    • Removed legacy .d→mzML converter and bumped pipeline manifest version.

ypriverol and others added 18 commits April 3, 2026 11:26
13-task plan covering robustness fixes, DDA support, new DIA-NN params,
InfinDIA groundwork, comprehensive documentation, and issue cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without pipefail, if the command before tee fails, tee returns 0 and
the Nextflow task appears to succeed. This masked failures in
generate_cfg, diann_msstats, samplesheet_check, and sdrf_parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are the longest-running tasks and most susceptible to transient
failures (OOM, I/O timeouts). The error_retry label enables automatic
retry on signal exits (130-145, 104, 175).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard ch_searchdb and ch_experiment_meta with ifEmpty to fail fast
with clear error messages instead of hanging indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds conf/diann_versions/v2_3_2.config with ghcr.io/bigbio/diann:2.3.2
container. Use -profile diann_v2_3_2 to opt in. Default stays 1.8.1.
Enables DDA support and InfinDIA features.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New param diann_dda (boolean, default: false)
- Version guard: requires DIA-NN >= 2.3.2
- Passes --dda to all 5 DIA-NN modules when enabled
- Accepts DDA acquisition method in SDRF when diann_dda=true
- Added --dda to blocked lists in all modules

Closes #5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_dda: BSA dataset with diann_dda=true on DIA-NN 2.3.2
- test_dia_skip_preanalysis: tests previously untested skip path
Both added to extended_ci.yml stage 2a.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- diann_light_models: 10x faster in-silico library generation
- diann_export_quant: fragment-level parquet export
- diann_site_ms1_quant: MS1 apex intensities for PTM quantification
All require DIA-NN >= 2.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Experimental support for InfinDIA (DIA-NN 2.3.0+). Passes --infin-dia
to library generation when enabled. Version guard enforces >= 2.3.0.
No test config — InfinDIA requires large databases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete reference for all ~70 pipeline parameters grouped by category
with types, defaults, descriptions, and version requirements.

Closes #1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDA mode documentation with limitations
- Missing param sections (preprocessing, extra_args scope, verbose output)
- DIA-NN version selection guide
- Parquet vs TSV output explanation
- MSstats format section
- pmultiqc citation added
- README updated with version table and parameter reference link

Closes #3, #9, #15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add version guard for DIA-NN 2.0+ params (--light-models,
  --export-quant, --site-ms1-quant) to prevent crashes with 1.8.1
- Add *.site_report.parquet as optional output in FINAL_QUANTIFICATION
  for site-level PTM quantification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. test_dda.config: Add diann_version = '2.3.2' so the version guard
   doesn't reject DDA mode (default is 1.8.1, guard requires >= 2.3.2)

2. quantmsdiann.nf: Update branch condition to also match "dda"
   acquisition method. Previously "dda".contains("dia") was false,
   causing all DDA files to be silently dropped from processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These flags exist in DIA-NN 1.8.x but were removed in 2.3.x, causing
'unrecognised option' warnings. Only pass them for versions < 2.3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough
📝 Walkthrough
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: releasing version 2.0.0 with DDA workflow support. It aligns with the substantial changes across configuration, documentation, and workflow modules.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit a679dc3

+| ✅ 106 tests passed       |+
#| ❔  19 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   4 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-04-14 06:26:32

ypriverol and others added 2 commits April 6, 2026 21:10
Bruker .d to mzML conversion via tdf2mzml is no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ypriverol ypriverol changed the title update zenodo link New release 2.0.0 including dda workflow for quantmsdiann Apr 7, 2026
@ypriverol ypriverol marked this pull request as draft April 7, 2026 05:47
ypriverol and others added 6 commits April 7, 2026 07:20
- Merge dev branch (version bump, tdf2mzml removal, lint fixes, DOI update)
- Update test_dda.config to use PXD022287 HeLa DDA dataset with subset FASTA
- Add test_dda profile to CI matrix in ci.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_dda profile uses ghcr.io/bigbio/diann:2.3.2 which is a
private container requiring authentication. Add Docker login step
(matching merge_ci.yml) conditioned on test_dda profile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove implementation plan from repo, add docs/plans/ to .gitignore
- Add lib/VersionUtils.groovy for semantic version comparison
  (prevents string comparison bugs like '2.10.0' < '2.3')
- Update all version guards in dia.nf and module scripts to use
  VersionUtils.versionLessThan/versionAtLeast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDA analysis support is a major feature warranting a major version bump.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename output version→versions in sdrf_parsing/meta.yml
- Add ch_ prefix to input_file→ch_input_file in input_check/meta.yml
- Fix grammar in pmultiqc and diann_msstats meta.yml descriptions
- Fix glob pattern in decompress_dotd/meta.yml (double-dot expansion)
- Update CITATIONS.md to link published Nature Methods article
- Fix schema_input.json error messages (source name, whitespace)
- Standardize quantmsdiann keyword in utils meta.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ypriverol and others added 6 commits April 10, 2026 09:25
Update the quantms-utils and pmultiqc images
Since quantmsdiann is a DIA-NN-only pipeline, the diann_ prefix on
parameters is redundant. Renamed all user-facing params:
  diann_debug -> debug_level
  diann_speclib -> speclib
  diann_extra_args -> extra_args
  diann_dda -> dda
  diann_light_models -> light_models
  diann_export_quant -> export_quant
  diann_site_ms1_quant -> site_ms1_quant
  diann_pre_select -> pre_select
  diann_report_decoys -> report_decoys
  diann_export_xic -> export_xic
  diann_normalize -> normalize
  diann_use_quant -> use_quant
  diann_tims_sum -> tims_sum
  diann_im_window -> im_window
  diann_channel_run_norm -> channel_run_norm
  diann_channel_spec_norm -> channel_spec_norm

Removed diann_no_peptidoforms entirely — superseded by the new
scoring_mode parameter (generic/proteoforms/peptidoforms).
--no-peptidoforms remains in blocked flags lists.

Note: diann_version is NOT renamed (used in profile names).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: rename FDR params and expose matrix-level q-value controls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: add DIA-NN scoring mode parameter (Generic/Proteoforms/Peptidoforms)
@ypriverol ypriverol marked this pull request as ready for review April 11, 2026 05:13
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/local/diann/diann_msstats/main.nf (1)

1-7: 🛠️ Refactor suggestion | 🟠 Major

Add the required diann process label.

This DIA module process is missing label 'diann', which breaks the expected label-based selection pattern.

🏷️ Suggested fix
 process DIANN_MSSTATS {
     tag "diann_msstats"
     label 'process_medium'
+    label 'diann'
As per coding guidelines, `modules/local/diann/*/main.nf`: All DIA-NN process modules must include both `label 'process_'` and `label 'diann'` labels for container and resource selection.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/local/diann/diann_msstats/main.nf` around lines 1 - 7, The
DIANN_MSSTATS process is missing the required diann label which prevents
label-based selection; update the process DIANN_MSSTATS declaration to include
label 'diann' alongside the existing label 'process_medium' (i.e., ensure both
label 'process_medium' and label 'diann' are present in the process block) so
container and resource selection works as other DIA-NN modules expect.
♻️ Duplicate comments (2)
docs/usage.md (1)

98-104: ⚠️ Potential issue | 🟡 Minor

Remove the second Preprocessing Options block.

This duplicates the earlier section at Line 32 and creates a second source of truth for the same parameters. Please fold any new content into the original section instead of reintroducing the heading here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/usage.md` around lines 98 - 104, Remove this duplicate "Preprocessing
Options" block and merge any unique flags or descriptions here (e.g.,
--reindex_mzml, --mzml_statistics, --mzml_features, --convert_dotd) into the
original "Preprocessing Options" section that appears earlier in the doc; delete
this repeated heading and its bullet items so there is only one authoritative
Preprocessing Options section containing the consolidated flags and defaults.
workflows/dia.nf (1)

76-95: ⚠️ Potential issue | 🟠 Major

Propagate the resolved DDA mode to every DIA-NN step.

ch_is_dda folds in params.dda, but only INSILICO_LIBRARY_GENERATION consumes that resolved boolean. The later DIA-NN modules still decide from meta.acquisition_method, so --dda true can make library generation run in DDA mode while PRELIMINARY/ASSEMBLE/INDIVIDUAL/FINAL stay in DIA mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/dia.nf` around lines 76 - 95, ch_is_dda computed from
ch_experiment_meta/params.dda is not being propagated to downstream DIA-NN steps
(only INSILICO_LIBRARY_GENERATION uses it), causing later modules to re-evaluate
meta.acquisition_method; update the PRELIMINARY, ASSEMBLE, INDIVIDUAL, FINAL
(and any other DIA-NN) invocations to consume the resolved ch_is_dda value
instead of reading meta.acquisition_method directly — e.g., pass ch_is_dda as an
input channel or wire it into those process calls so their logic uses the
boolean from ch_is_dda (and remove or override any meta.acquisition_method-based
checks inside those processes).
🧹 Nitpick comments (3)
conf/tests/test_latest_dia.config (1)

14-15: Consider updating "latest" profile to DIA-NN 2.3.2.

The profile name and description reference "latest DIA (2.2.0)", but the PR includes support for DIA-NN 2.3.2 which adds DDA support (per conf/diann_versions/v2_3_2.config). If 2.3.2 is intended to be the latest supported version, consider updating this test profile.

Also applies to: 41-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/tests/test_latest_dia.config` around lines 14 - 15, Update the "latest
DIA" test profile strings to reflect DIA-NN v2.3.2: change config_profile_name
and config_profile_description to mention "latest DIA (2.3.2)" and update the
description to note DDA support; ensure these edits align with the new version
file conf/diann_versions/v2_3_2.config and also update the duplicate occurrences
referenced at lines 41-42.
conf/modules/dia.config (1)

13-31: Optional: consolidate duplicated ext.args selectors.

These five blocks are functionally identical and can be merged into one regex selector to reduce config drift risk.

♻️ Suggested consolidation
 process {
-
-    withName: ".*:DIA:INSILICO_LIBRARY_GENERATION" {
-        ext.args   = { params.extra_args ?: '' }
-    }
-
-    withName: ".*:DIA:PRELIMINARY_ANALYSIS" {
-        ext.args   = { params.extra_args ?: '' }
-    }
-
-    withName: ".*:DIA:ASSEMBLE_EMPIRICAL_LIBRARY" {
-        ext.args   = { params.extra_args ?: '' }
-    }
-
-    withName: ".*:DIA:INDIVIDUAL_ANALYSIS" {
-        ext.args   = { params.extra_args ?: '' }
-    }
-
-    withName: ".*:DIA:FINAL_QUANTIFICATION" {
+    withName: ".*:DIA:(INSILICO_LIBRARY_GENERATION|PRELIMINARY_ANALYSIS|ASSEMBLE_EMPIRICAL_LIBRARY|INDIVIDUAL_ANALYSIS|FINAL_QUANTIFICATION)" {
         ext.args   = { params.extra_args ?: '' }
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/modules/dia.config` around lines 13 - 31, The five identical selectors
(withName: ".*:DIA:INSILICO_LIBRARY_GENERATION", ".*:DIA:PRELIMINARY_ANALYSIS",
".*:DIA:ASSEMBLE_EMPIRICAL_LIBRARY", ".*:DIA:INDIVIDUAL_ANALYSIS",
".*:DIA:FINAL_QUANTIFICATION") all set ext.args = { params.extra_args ?: '' }
and should be consolidated into a single selector using a combined regex (e.g.,
".*:DIA:(INSILICO_LIBRARY_GENERATION|PRELIMINARY_ANALYSIS|ASSEMBLE_EMPIRICAL_LIBRARY|INDIVIDUAL_ANALYSIS|FINAL_QUANTIFICATION)")
so that ext.args is defined once; update the withName block to use that combined
regex and remove the five duplicate blocks, keeping the ext.args assignment
as-is.
docs/parameters.md (1)

55-64: Document --dda in one place to avoid drift.

This parameter is now described in both the general DIA-NN table and the dedicated DDA section. A single canonical entry plus a short cross-reference would be easier to keep in sync.

Also applies to: 124-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/parameters.md` around lines 55 - 64, Consolidate the `--dda` parameter
documentation by keeping a single canonical entry in the DIA‑NN parameters table
(the `--dda` row) and remove the duplicate detailed description from the
dedicated DDA section; in that DDA section replace the duplicate text with a
short cross-reference pointing to the parameters table (e.g., "See `--dda` in
the DIA‑NN parameters table"). Apply the same consolidation for the other
duplicate at the later occurrence (lines referenced in the review), ensuring the
unique symbol `--dda` is the single source of truth and the DDA section only
links to it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/parameters.md`:
- Line 30: The docs list flags `--convert_dotd` and `--empirical_assembly_log`
but those parameters are missing from nextflow_schema.json; either add these
params to the pipeline config and then regenerate nextflow_schema.json with
`nf-core pipelines schema build` so the schema includes `--convert_dotd` and
`--empirical_assembly_log`, or remove these entries from docs/parameters.md (and
any other occurrences noted) to keep docs and schema in sync; update the
pipeline parameter definitions where `convert_dotd` and `empirical_assembly_log`
are declared so they match the names/types documented before rebuilding the
schema.

In `@modules/local/diann/final_quantification/main.nf`:
- Around line 49-56: The blocked-flag list named blocked in main.nf is missing
the channel-normalization CLI switches owned by params.channel_run_norm and
params.channel_spec_norm; add the corresponding flags (e.g. --channel-run-norm
and --channel-spec-norm) to the blocked array where it’s defined and also add
them to the second blocked list referenced later (the block at lines ~78-83) so
task.ext.args cannot silently pass those switches through.

In `@modules/local/diann/insilico_library_generation/meta.yml`:
- Around line 22-24: The metadata description for the boolean flag is_dda in
meta.yml mentions the wrong CLI flag name (--diann_dda); update that description
to reference the actual runtime flag (--dda) so it matches the module command
path and runtime behavior, keeping the rest of the description intact.

In `@nextflow_schema.json`:
- Around line 324-344: The q-value fields precursor_qvalue, matrix_qvalue, and
matrix_spec_q are currently unbounded numbers; update each property's schema to
constrain values to the valid probability range by adding "minimum": 0 and
"maximum": 1 (retaining "type": "number") so values outside [0,1] (e.g., -1 or
1.5) will fail validation before reaching DIA-NN.
- Around line 479-513: The CI lacks test matrix entries for new DIA-NN
feature/version combos; update the test-features-matrix in merge_ci.yml to add
matrix entries for test_light_models with versions 2.1.0 and 2.2.0, add a new
test_infin_dia profile pinned to 2.3.2, and include test_dda pinned to 2.3.2
(matching the extended_ci.yml profile) so the workflow gating in
workflows/dia.nf is exercised for light_models, enable_infin_dia/pre_select, and
dda minimum supported versions.

In `@nextflow.config`:
- Line 380: The pipeline manifest currently sets version = '2.0.0dev' which
marks published metadata as a development build; change the version value in
nextflow.config (the version variable) from '2.0.0dev' to the final release
string '2.0.0' so the manifest advertises the correct release version.

In `@subworkflows/local/create_input_channel/main.nf`:
- Around line 103-104: The code collects fixedMods via rows.collect and then
unconditionally sets meta.fixedmodifications to the first unique value, which
silently ignores conflicting values; change this to mirror the enzyme
validation: after computing fixedMods (the unique non-empty list), if
fixedMods.size() == 1 set meta.fixedmodifications to that value, if
fixedMods.size() == 0 set it to null, and if fixedMods.size() > 1 throw an error
(or pipeline exit) with a clear message referencing the affected file/rows;
update the logic around the fixedMods variable and meta.fixedmodifications to
perform this consistency check instead of taking fixedMods[0].

---

Outside diff comments:
In `@modules/local/diann/diann_msstats/main.nf`:
- Around line 1-7: The DIANN_MSSTATS process is missing the required diann label
which prevents label-based selection; update the process DIANN_MSSTATS
declaration to include label 'diann' alongside the existing label
'process_medium' (i.e., ensure both label 'process_medium' and label 'diann' are
present in the process block) so container and resource selection works as other
DIA-NN modules expect.

---

Duplicate comments:
In `@docs/usage.md`:
- Around line 98-104: Remove this duplicate "Preprocessing Options" block and
merge any unique flags or descriptions here (e.g., --reindex_mzml,
--mzml_statistics, --mzml_features, --convert_dotd) into the original
"Preprocessing Options" section that appears earlier in the doc; delete this
repeated heading and its bullet items so there is only one authoritative
Preprocessing Options section containing the consolidated flags and defaults.

In `@workflows/dia.nf`:
- Around line 76-95: ch_is_dda computed from ch_experiment_meta/params.dda is
not being propagated to downstream DIA-NN steps (only
INSILICO_LIBRARY_GENERATION uses it), causing later modules to re-evaluate
meta.acquisition_method; update the PRELIMINARY, ASSEMBLE, INDIVIDUAL, FINAL
(and any other DIA-NN) invocations to consume the resolved ch_is_dda value
instead of reading meta.acquisition_method directly — e.g., pass ch_is_dda as an
input channel or wire it into those process calls so their logic uses the
boolean from ch_is_dda (and remove or override any meta.acquisition_method-based
checks inside those processes).

---

Nitpick comments:
In `@conf/modules/dia.config`:
- Around line 13-31: The five identical selectors (withName:
".*:DIA:INSILICO_LIBRARY_GENERATION", ".*:DIA:PRELIMINARY_ANALYSIS",
".*:DIA:ASSEMBLE_EMPIRICAL_LIBRARY", ".*:DIA:INDIVIDUAL_ANALYSIS",
".*:DIA:FINAL_QUANTIFICATION") all set ext.args = { params.extra_args ?: '' }
and should be consolidated into a single selector using a combined regex (e.g.,
".*:DIA:(INSILICO_LIBRARY_GENERATION|PRELIMINARY_ANALYSIS|ASSEMBLE_EMPIRICAL_LIBRARY|INDIVIDUAL_ANALYSIS|FINAL_QUANTIFICATION)")
so that ext.args is defined once; update the withName block to use that combined
regex and remove the five duplicate blocks, keeping the ext.args assignment
as-is.

In `@conf/tests/test_latest_dia.config`:
- Around line 14-15: Update the "latest DIA" test profile strings to reflect
DIA-NN v2.3.2: change config_profile_name and config_profile_description to
mention "latest DIA (2.3.2)" and update the description to note DDA support;
ensure these edits align with the new version file
conf/diann_versions/v2_3_2.config and also update the duplicate occurrences
referenced at lines 41-42.

In `@docs/parameters.md`:
- Around line 55-64: Consolidate the `--dda` parameter documentation by keeping
a single canonical entry in the DIA‑NN parameters table (the `--dda` row) and
remove the duplicate detailed description from the dedicated DDA section; in
that DDA section replace the duplicate text with a short cross-reference
pointing to the parameters table (e.g., "See `--dda` in the DIA‑NN parameters
table"). Apply the same consolidation for the other duplicate at the later
occurrence (lines referenced in the review), ensuring the unique symbol `--dda`
is the single source of truth and the DDA section only links to it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb1c960b-6643-4ccd-a116-e58a337476fb

📥 Commits

Reviewing files that changed from the base of the PR and between 207b6b6 and b2c9219.

📒 Files selected for processing (37)
  • AGENTS.md
  • README.md
  • conf/diann_versions/v2_1_0.config
  • conf/diann_versions/v2_2_0.config
  • conf/diann_versions/v2_3_2.config
  • conf/modules/dia.config
  • conf/pride_codon_slurm.config
  • conf/tests/test_dda.config
  • conf/tests/test_dia.config
  • conf/tests/test_dia_2_2_0.config
  • conf/tests/test_dia_dotd.config
  • conf/tests/test_dia_parquet.config
  • conf/tests/test_dia_quantums.config
  • conf/tests/test_dia_skip_preanalysis.config
  • conf/tests/test_full_dia.config
  • conf/tests/test_latest_dia.config
  • docs/parameters.md
  • docs/usage.md
  • modules/local/diann/assemble_empirical_library/main.nf
  • modules/local/diann/assemble_empirical_library/meta.yml
  • modules/local/diann/diann_msstats/main.nf
  • modules/local/diann/final_quantification/main.nf
  • modules/local/diann/final_quantification/meta.yml
  • modules/local/diann/generate_cfg/main.nf
  • modules/local/diann/individual_analysis/main.nf
  • modules/local/diann/individual_analysis/meta.yml
  • modules/local/diann/insilico_library_generation/main.nf
  • modules/local/diann/insilico_library_generation/meta.yml
  • modules/local/diann/preliminary_analysis/main.nf
  • modules/local/diann/preliminary_analysis/meta.yml
  • modules/local/pmultiqc/main.nf
  • modules/local/samplesheet_check/main.nf
  • modules/local/utils/mzml_statistics/main.nf
  • nextflow.config
  • nextflow_schema.json
  • subworkflows/local/create_input_channel/main.nf
  • workflows/dia.nf
✅ Files skipped from review due to trivial changes (7)
  • modules/local/diann/assemble_empirical_library/meta.yml
  • conf/diann_versions/v2_2_0.config
  • modules/local/diann/preliminary_analysis/meta.yml
  • modules/local/diann/individual_analysis/meta.yml
  • modules/local/diann/generate_cfg/main.nf
  • modules/local/diann/final_quantification/meta.yml
  • conf/tests/test_dda.config
🚧 Files skipped from review as they are similar to previous changes (3)
  • AGENTS.md
  • conf/diann_versions/v2_3_2.config
  • conf/tests/test_dia_skip_preanalysis.config

Comment on lines +22 to +24
- is_dda:
type: boolean
description: Whether DDA mode is enabled (auto-detected from SDRF or set via --diann_dda)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update flag name in is_dda metadata description.

Line 24 mentions --diann_dda, but the module command path uses --dda; this description should match runtime behavior.

📝 Suggested fix
-      description: Whether DDA mode is enabled (auto-detected from SDRF or set via --diann_dda)
+      description: Whether DDA mode is enabled (auto-detected from SDRF or set via --dda)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- is_dda:
type: boolean
description: Whether DDA mode is enabled (auto-detected from SDRF or set via --diann_dda)
- is_dda:
type: boolean
description: Whether DDA mode is enabled (auto-detected from SDRF or set via --dda)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/local/diann/insilico_library_generation/meta.yml` around lines 22 -
24, The metadata description for the boolean flag is_dda in meta.yml mentions
the wrong CLI flag name (--diann_dda); update that description to reference the
actual runtime flag (--dda) so it matches the module command path and runtime
behavior, keeping the rest of the description intact.

Comment on lines +324 to +344
"precursor_qvalue": {
"type": "number",
"description": "Precursor-level q-value filtering threshold for the DIA-NN main report. Maps to --qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls how strictly precursor identifications are filtered in the DIA-NN main report. For proteogenomics with variant databases, the standard 0.01 (1%) is recommended."
},
"matrix_qvalue": {
"type": "number",
"description": "Q-value threshold for DIA-NN output matrices (pr_matrix, pg_matrix, etc.). Maps to --matrix-qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls the global q-value filtering applied when generating the quantification matrices. Default matches DIA-NN's built-in default of 1%."
},
"matrix_spec_q": {
"type": "number",
"description": "Run-specific protein q-value filter for protein/gene matrices. Maps to --matrix-spec-q.",
"default": 0.05,
"fa_icon": "fas fa-filter",
"help_text": "An additional run-specific protein-level FDR filter applied to the protein and gene matrices. Default matches DIA-NN's built-in default of 5%. For proteogenomics/variant detection, consider setting to 1.0 to retain variant proteins that lack sufficient unique peptides for protein-level confidence."
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Constrain q-value parameters to the valid probability range.

These three thresholds are currently any number, so values like -1 or 1.5 will pass schema validation and reach DIA-NN. Please bound them to [0, 1].

Suggested schema patch
                 "precursor_qvalue": {
                     "type": "number",
+                    "minimum": 0,
+                    "maximum": 1,
                     "description": "Precursor-level q-value filtering threshold for the DIA-NN main report. Maps to --qvalue.",
                     "default": 0.01,
                     "fa_icon": "fas fa-filter",
                     "help_text": "Controls how strictly precursor identifications are filtered in the DIA-NN main report. For proteogenomics with variant databases, the standard 0.01 (1%) is recommended."
                 },
                 "matrix_qvalue": {
                     "type": "number",
+                    "minimum": 0,
+                    "maximum": 1,
                     "description": "Q-value threshold for DIA-NN output matrices (pr_matrix, pg_matrix, etc.). Maps to --matrix-qvalue.",
                     "default": 0.01,
                     "fa_icon": "fas fa-filter",
                     "help_text": "Controls the global q-value filtering applied when generating the quantification matrices. Default matches DIA-NN's built-in default of 1%."
                 },
                 "matrix_spec_q": {
                     "type": "number",
+                    "minimum": 0,
+                    "maximum": 1,
                     "description": "Run-specific protein q-value filter for protein/gene matrices. Maps to --matrix-spec-q.",
                     "default": 0.05,
                     "fa_icon": "fas fa-filter",
                     "help_text": "An additional run-specific protein-level FDR filter applied to the protein and gene matrices. Default matches DIA-NN's built-in default of 5%. For proteogenomics/variant detection, consider setting to 1.0 to retain variant proteins that lack sufficient unique peptides for protein-level confidence."
                 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"precursor_qvalue": {
"type": "number",
"description": "Precursor-level q-value filtering threshold for the DIA-NN main report. Maps to --qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls how strictly precursor identifications are filtered in the DIA-NN main report. For proteogenomics with variant databases, the standard 0.01 (1%) is recommended."
},
"matrix_qvalue": {
"type": "number",
"description": "Q-value threshold for DIA-NN output matrices (pr_matrix, pg_matrix, etc.). Maps to --matrix-qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls the global q-value filtering applied when generating the quantification matrices. Default matches DIA-NN's built-in default of 1%."
},
"matrix_spec_q": {
"type": "number",
"description": "Run-specific protein q-value filter for protein/gene matrices. Maps to --matrix-spec-q.",
"default": 0.05,
"fa_icon": "fas fa-filter",
"help_text": "An additional run-specific protein-level FDR filter applied to the protein and gene matrices. Default matches DIA-NN's built-in default of 5%. For proteogenomics/variant detection, consider setting to 1.0 to retain variant proteins that lack sufficient unique peptides for protein-level confidence."
},
"precursor_qvalue": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Precursor-level q-value filtering threshold for the DIA-NN main report. Maps to --qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls how strictly precursor identifications are filtered in the DIA-NN main report. For proteogenomics with variant databases, the standard 0.01 (1%) is recommended."
},
"matrix_qvalue": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Q-value threshold for DIA-NN output matrices (pr_matrix, pg_matrix, etc.). Maps to --matrix-qvalue.",
"default": 0.01,
"fa_icon": "fas fa-filter",
"help_text": "Controls the global q-value filtering applied when generating the quantification matrices. Default matches DIA-NN's built-in default of 1%."
},
"matrix_spec_q": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Run-specific protein q-value filter for protein/gene matrices. Maps to --matrix-spec-q.",
"default": 0.05,
"fa_icon": "fas fa-filter",
"help_text": "An additional run-specific protein-level FDR filter applied to the protein and gene matrices. Default matches DIA-NN's built-in default of 5%. For proteogenomics/variant detection, consider setting to 1.0 to retain variant proteins that lack sufficient unique peptides for protein-level confidence."
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nextflow_schema.json` around lines 324 - 344, The q-value fields
precursor_qvalue, matrix_qvalue, and matrix_spec_q are currently unbounded
numbers; update each property's schema to constrain values to the valid
probability range by adding "minimum": 0 and "maximum": 1 (retaining "type":
"number") so values outside [0,1] (e.g., -1 or 1.5) will fail validation before
reaching DIA-NN.

Comment on lines +479 to +513
"dda": {
"type": "boolean",
"description": "Explicitly enable DDA mode. Normally auto-detected from the SDRF acquisition method column. Use only when SDRF lacks this column. Requires DIA-NN >= 2.3.2.",
"fa_icon": "fas fa-flask",
"default": false
},
"light_models": {
"type": "boolean",
"description": "Enable --light-models for 10x faster in-silico library generation (DIA-NN >= 2.0).",
"fa_icon": "fas fa-bolt",
"default": false
},
"export_quant": {
"type": "boolean",
"description": "Enable --export-quant for fragment-level parquet data export (DIA-NN >= 2.0).",
"fa_icon": "fas fa-file-export",
"default": false
},
"site_ms1_quant": {
"type": "boolean",
"description": "Enable --site-ms1-quant to use MS1 apex intensities for PTM site quantification (DIA-NN >= 2.0).",
"fa_icon": "fas fa-crosshairs",
"default": false
},
"enable_infin_dia": {
"type": "boolean",
"description": "Enable InfinDIA for ultra-large search spaces (DIA-NN >= 2.3.0). Experimental.",
"fa_icon": "fas fa-infinity",
"default": false
},
"pre_select": {
"type": "integer",
"description": "Set --pre-select N precursor limit for InfinDIA pre-search.",
"fa_icon": "fas fa-filter"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== DIA-NN feature flag call sites ==\n'
rg -n -C2 \
  --iglob '*.nf' \
  --iglob '*.config' \
  --iglob '*.yml' \
  '(\-\-(dda|light-models|export-quant|site-ms1-quant|infin-dia|pre-select))|(params\.(dda|light_models|export_quant|site_ms1_quant|enable_infin_dia|pre_select))|diann_version' .

printf '\n== CI workflow matrix ==\n'
fd -i 'extended_ci\.yml|merge_ci\.yml' .github/workflows -x sed -n '1,220p' {}

Repository: bigbio/quantmsdiann

Length of output: 32767


🏁 Script executed:

# Check if modules have conditional version-gating at the module level
rg -n "light_models|export_quant|site_ms1_quant|enable_infin_dia" \
  modules/local/diann/*/main.nf \
  -A 1 -B 1

# Verify merge_ci.yml includes DDA/2.3.2 tests
grep -A 50 "test-features-matrix:" .github/workflows/merge_ci.yml | head -60

Repository: bigbio/quantmsdiann

Length of output: 3378


Add CI matrix entries to test new DIA-NN features at their minimum supported versions.

The workflow gating in workflows/dia.nf prevents unsupported version+feature combinations from reaching the CLI, but merge_ci.yml lacks test coverage for light_models, enable_infin_dia/pre_select (2.3.0+), and DDA (2.3.2+). Extend the test-features-matrix in merge_ci.yml to include:

  • test_light_models × {2.1.0, 2.2.0}
  • test_infin_dia × 2.3.2 (new test profile)
  • test_dda × 2.3.2 (from extended_ci.yml, add to merge path)

This ensures version guards are validated during the merge-to-master gate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nextflow_schema.json` around lines 479 - 513, The CI lacks test matrix
entries for new DIA-NN feature/version combos; update the test-features-matrix
in merge_ci.yml to add matrix entries for test_light_models with versions 2.1.0
and 2.2.0, add a new test_infin_dia profile pinned to 2.3.2, and include
test_dda pinned to 2.3.2 (matching the extended_ci.yml profile) so the workflow
gating in workflows/dia.nf is exercised for light_models,
enable_infin_dia/pre_select, and dda minimum supported versions.

Replace duplicated blocked-flags logic (10+ lines x 5 modules) with
a single centralized registry in lib/BlockedFlags.groovy. Each module
now uses one line: `args = BlockedFlags.strip('MODULE_NAME', args, log)`

Fixes from PR #36 review:
- Add --no-prot-inf to ASSEMBLE, INDIVIDUAL, FINAL blocked lists
- Add --channel-run-norm, --channel-spec-norm to FINAL blocked list
- Add --var-mod, --fixed-mod, --channels to INSILICO (via COMMON)
- Add --relaxed-prot-inf, --pg-level to ASSEMBLE blocked list
- Add version guard for channel normalization flags (require >= 2.0)
- Add warning when --normalize false conflicts with channel norm flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ypriverol and others added 4 commits April 11, 2026 06:47
Add comments explaining:
- Why blocked flags exist (prevent silent DIA-NN flag conflicts)
- Why they live in a Groovy class, not config files (safety — can't
  be overridden by user configs)
- How to add new blocked flags (edit one file, no module changes)
- In each module: point developers to lib/BlockedFlags.groovy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ASSEMBLE_EMPIRICAL_LIBRARY doesn't set --no-prot-inf,
--relaxed-prot-inf, or --pg-level in its command, so blocking
them would strip user values without providing a replacement.
Users should be able to pass these to the assembly step if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each blocked flag now has a comment explaining WHY it's blocked:
- "Pipeline-managed": the pipeline sets it from params/SDRF/metadata
- "No-effect guard": the flag has no effect in this step but is
  blocked to prevent users from wrongly believing it does

This prevents future contributors (human or AI) from removing flags
without understanding the intent behind the block.

Reverts the accidental removal of protein inference flags from
ASSEMBLE_EMPIRICAL_LIBRARY — they are intentionally blocked as a
no-effect guard since --gen-spec-lib produces a spectral library,
not a quantified report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: centralize blocked flags + fix missing guards from PR #36 review
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore: bump version to 2.0.0 (Rome) and update CHANGELOG
ypriverol and others added 7 commits April 13, 2026 11:01
- New v2_5_0.config version profile (ghcr.io/bigbio/diann:2.5.0)
- Add diann_v2_5_0 profile to nextflow.config
- Block --parent flag in BlockedFlags.groovy COMMON list
  (container-managed, overriding breaks model discovery)
- Document fine-tuning workflow in docs/usage.md:
  - How to generate tuning libraries
  - How to fine-tune RT/IM/fragment models
  - How to use fine-tuned models via --extra_args
- Update version tables in docs/usage.md and AGENTS.md

New DIA-NN 2.5.0 CLI flags (passable via --extra_args):
  --tokens, --rt-model, --fr-model, --im-model (model selection)
  --aa-eq (amino acid equivalence for reannotation)
  --tune-lib, --tune-rt, --tune-im, --tune-fr (fine-tuning)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document all new DIA-NN 2.5.0 flags available via --extra_args:
model selection (--tokens, --rt-model, --fr-model, --im-model),
fine-tuning (--tune-lib, --tune-rt, --tune-im, --tune-fr, etc.),
and --aa-eq for amino acid equivalence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add aa_eq param (default: false) that maps to DIA-NN's --aa-eq flag.
When enabled, I&L, Q&E, N&D are treated as equivalent amino acids
during reannotation — essential for entrapment FDR benchmarks.

- Added to nextflow.config, nextflow_schema.json, docs/parameters.md
- Passed to all 5 DIA-NN modules
- Added to BlockedFlags.groovy COMMON list (pipeline-managed)
- Moved from "via extra_args" to proper pipeline parameter in docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… for Vadim

Rewrite the fine-tuning documentation to:
- Explain what tokens/dict.txt are (neural network encoding of modifications)
- Clarify that --tune-lib cannot be combined with --gen-spec-lib
  (separate DIA-NN invocations, confirmed in DIA-NN #1499)
- Document the full two-run workflow with concrete commands
- Propose the future integrated FINE_TUNE_MODELS step with a
  question for @vdemichev about the right integration point

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: add DIA-NN 2.5.0 support with model fine-tuning documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants