Conversation
- Add minimal QPX test dataset (50 spectra, 14 KB) in test_data/TEST001/ - Update test.config with default_species/default_instrument for QPX - Fix SVG workflow diagram: wider spacing, no overlapping labels/badges - Update .gitignore to allow test_data/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updated CI workflow to include syntax validation and test data checks.
Agent-Logs-Url: https://github.com/bigbio/spectrafuse/sessions/955c5f4a-8a76-437f-b5c2-f9c5f8ba6f3d Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Fix CI workflow YAML parse error in artifact upload step
Feature/dat bypass workflow
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 33 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe CI pipeline is refactored from a single test job into a three-stage sequence: syntax validation, data validation, and pipeline execution. Singularity support and FTP-based test data downloads are removed. Test data is bundled with the repository, and Docker becomes the sole execution profile with a shortened 30-minute timeout. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.gitignore (1)
175-180:⚠️ Potential issue | 🟠 Major
*.parquetstill keeps new bundled fixtures out of version control.Line 177 still ignores every Parquet file, so adding or renaming anything under
test_data/will stay untracked even though this PR moves test fixtures into the repo.Suggested fix
# Data files - should not be in repository data/ *.parquet +!test_data/**/*.parquet # CI/CD reports reports/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 175 - 180, The .gitignore entry '*.parquet' is globally ignoring all Parquet files (so new test fixtures under test_data/ remain untracked); update .gitignore to stop excluding every .parquet file — either remove the '*.parquet' pattern or replace it with a more specific pattern (e.g. only ignore data/ or a specific fixtures path) and if you need to keep most Parquets ignored, add a negation for the test fixtures (e.g. add a pattern like '!test_data/**.parquet') so the project’s test fixtures are tracked; update the file to remove the global '*.parquet' ignore and ensure test_data Parquet files are no longer excluded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 37-53: The CI Nextflow runs for the jobs named "Validate pipeline
syntax (preview mode)" and "Validate incremental mode syntax" are bypassing the
test profile; update both nextflow run commands to enable the test profile
(e.g., add -profile test or -c conf/tests/test.config) so conf/tests/test.config
is loaded during CI rather than relying on ad-hoc overrides, and keep any
necessary lightweight overrides but do not remove or bypass the test profile
flag.
- Around line 84-92: The preflight currently only asserts presence of columns
for the `run` and `sample` pyarrow Tables but doesn't fail when those tables are
empty; update the checks around the `run` and `sample` variables to also assert
they contain rows (e.g., assert len(run) > 0 and assert len(sample) > 0 or use
run.num_rows/sample.num_rows) and include brief failure messages so the CI fails
fast if either parquet fixture is empty.
- Around line 136-145: The shell test using an unquoted glob ([ -f
test_results/pipeline_info/execution_trace*.txt ]) can expand to multiple args
and fail when several trace files exist; replace that check with a glob-safe
probe such as using compgen -G "test_results/pipeline_info/execution_trace*.txt"
>/dev/null || true (or enabling nullglob and checking for files), then run grep
against the matched files (execution_trace*.txt) as before and preserve the rest
of the logic that sets failed and iterates to copy logs in the "Gather failed
logs" step so the variable failed and the while-read loop continue to work
correctly.
---
Outside diff comments:
In @.gitignore:
- Around line 175-180: The .gitignore entry '*.parquet' is globally ignoring all
Parquet files (so new test fixtures under test_data/ remain untracked); update
.gitignore to stop excluding every .parquet file — either remove the '*.parquet'
pattern or replace it with a more specific pattern (e.g. only ignore data/ or a
specific fixtures path) and if you need to keep most Parquets ignored, add a
negation for the test fixtures (e.g. add a pattern like '!test_data/**.parquet')
so the project’s test fixtures are tracked; update the file to remove the global
'*.parquet' ignore and ensure test_data Parquet files are no longer excluded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51a96497-2a4b-48cd-acf7-84ea414b9da1
⛔ Files ignored due to path filters (4)
docs/images/spectrafuse_workflow.svgis excluded by!**/*.svgtest_data/TEST001/TEST001.psm.parquetis excluded by!**/*.parquettest_data/TEST001/TEST001.run.parquetis excluded by!**/*.parquettest_data/TEST001/TEST001.sample.parquetis excluded by!**/*.parquet
📒 Files selected for processing (3)
.github/workflows/ci.yml.gitignoreconf/tests/test.config
| run = pq.read_table('test_data/TEST001/TEST001.run.parquet') | ||
| assert 'run_file_name' in run.column_names | ||
| assert 'instrument' in run.column_names | ||
| print(f'Run parquet: {len(run)} rows, columns: {run.column_names}') | ||
|
|
||
| sample = pq.read_table('test_data/TEST001/TEST001.sample.parquet') | ||
| assert 'run_file_name' in sample.column_names | ||
| assert 'organism' in sample.column_names | ||
| print(f'Sample parquet: {len(sample)} rows, columns: {sample.column_names}') |
There was a problem hiding this comment.
Fail fast if the run or sample parquet fixtures are empty.
This preflight only checks row count for the PSM parquet. Empty metadata tables would still pass this job and then fail later in the actual pipeline run.
Suggested fix
run = pq.read_table('test_data/TEST001/TEST001.run.parquet')
assert 'run_file_name' in run.column_names
assert 'instrument' in run.column_names
+ assert len(run) > 0, 'Run table is empty'
print(f'Run parquet: {len(run)} rows, columns: {run.column_names}')
sample = pq.read_table('test_data/TEST001/TEST001.sample.parquet')
assert 'run_file_name' in sample.column_names
assert 'organism' in sample.column_names
+ assert len(sample) > 0, 'Sample table is empty'
print(f'Sample parquet: {len(sample)} rows, columns: {sample.column_names}')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run = pq.read_table('test_data/TEST001/TEST001.run.parquet') | |
| assert 'run_file_name' in run.column_names | |
| assert 'instrument' in run.column_names | |
| print(f'Run parquet: {len(run)} rows, columns: {run.column_names}') | |
| sample = pq.read_table('test_data/TEST001/TEST001.sample.parquet') | |
| assert 'run_file_name' in sample.column_names | |
| assert 'organism' in sample.column_names | |
| print(f'Sample parquet: {len(sample)} rows, columns: {sample.column_names}') | |
| run = pq.read_table('test_data/TEST001/TEST001.run.parquet') | |
| assert 'run_file_name' in run.column_names | |
| assert 'instrument' in run.column_names | |
| assert len(run) > 0, 'Run table is empty' | |
| print(f'Run parquet: {len(run)} rows, columns: {run.column_names}') | |
| sample = pq.read_table('test_data/TEST001/TEST001.sample.parquet') | |
| assert 'run_file_name' in sample.column_names | |
| assert 'organism' in sample.column_names | |
| assert len(sample) > 0, 'Sample table is empty' | |
| print(f'Sample parquet: {len(sample)} rows, columns: {sample.column_names}') |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci.yml around lines 84 - 92, The preflight currently only
asserts presence of columns for the `run` and `sample` pyarrow Tables but
doesn't fail when those tables are empty; update the checks around the `run` and
`sample` variables to also assert they contain rows (e.g., assert len(run) > 0
and assert len(sample) > 0 or use run.num_rows/sample.num_rows) and include
brief failure messages so the CI fails fast if either parquet fixture is empty.
| - name: Gather failed logs | ||
| if: failure() || cancelled() | ||
| run: | | ||
| echo "ARTIFACT_TIMESTAMP=$(date +%s)" >> $GITHUB_ENV | ||
|
|
||
| - uses: actions/upload-artifact@v4 | ||
| mkdir -p failed_logs | ||
| if [ -f test_results/pipeline_info/execution_trace*.txt ]; then | ||
| failed=$(grep "FAILED" test_results/pipeline_info/execution_trace*.txt | cut -f 2 || true) | ||
| while read -r line ; do | ||
| [ -n "$line" ] && cp $(ls work/${line}*/*.log 2>/dev/null) failed_logs/ 2>/dev/null || true | ||
| done <<< "$failed" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir"' EXIT
mkdir -p "$tmpdir/test_results/pipeline_info"
touch \
"$tmpdir/test_results/pipeline_info/execution_trace_1.txt" \
"$tmpdir/test_results/pipeline_info/execution_trace_2.txt"
cd "$tmpdir"
set +e
[ -f test_results/pipeline_info/execution_trace*.txt ]
status=$?
set -e
echo "exit_status=$status"
echo "Expected: non-zero status because the glob expands to multiple paths."Repository: bigbio/spectrafuse
Length of output: 260
🏁 Script executed:
# First, check if the file exists and retrieve the relevant section
cat -n .github/workflows/ci.yml | sed -n '130,150p'Repository: bigbio/spectrafuse
Length of output: 1922
🏁 Script executed:
# Check for other problematic glob patterns in the workflow file
rg '\*' .github/workflows/ci.yml | head -20Repository: bigbio/spectrafuse
Length of output: 344
Make the trace-file probe glob-safe.
Lines 140–141 use unquoted globs that break when multiple trace files exist. The [ -f test_results/pipeline_info/execution_trace*.txt ] test expands to multiple arguments, causing a shell syntax error and silently skipping failure-log collection.
Suggested fix
- name: Gather failed logs
if: failure() || cancelled()
run: |
mkdir -p failed_logs
+ shopt -s nullglob
+ trace_files=(test_results/pipeline_info/execution_trace*.txt)
+ if ((${`#trace_files`[@]})); then
- if [ -f test_results/pipeline_info/execution_trace*.txt ]; then
- failed=$(grep "FAILED" test_results/pipeline_info/execution_trace*.txt | cut -f 2 || true)
+ failed=$(grep "FAILED" "${trace_files[@]}" | cut -f 2 || true)
while read -r line ; do
- [ -n "$line" ] && cp $(ls work/${line}*/*.log 2>/dev/null) failed_logs/ 2>/dev/null || true
+ [ -n "$line" ] && cp work/${line}*/*.log failed_logs/ 2>/dev/null || true
done <<< "$failed"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Gather failed logs | |
| if: failure() || cancelled() | |
| run: | | |
| echo "ARTIFACT_TIMESTAMP=$(date +%s)" >> $GITHUB_ENV | |
| - uses: actions/upload-artifact@v4 | |
| mkdir -p failed_logs | |
| if [ -f test_results/pipeline_info/execution_trace*.txt ]; then | |
| failed=$(grep "FAILED" test_results/pipeline_info/execution_trace*.txt | cut -f 2 || true) | |
| while read -r line ; do | |
| [ -n "$line" ] && cp $(ls work/${line}*/*.log 2>/dev/null) failed_logs/ 2>/dev/null || true | |
| done <<< "$failed" | |
| fi | |
| - name: Gather failed logs | |
| if: failure() || cancelled() | |
| run: | | |
| mkdir -p failed_logs | |
| shopt -s nullglob | |
| trace_files=(test_results/pipeline_info/execution_trace*.txt) | |
| if ((${`#trace_files`[@]})); then | |
| failed=$(grep "FAILED" "${trace_files[@]}" | cut -f 2 || true) | |
| while read -r line ; do | |
| [ -n "$line" ] && cp work/${line}*/*.log failed_logs/ 2>/dev/null || true | |
| done <<< "$failed" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci.yml around lines 136 - 145, The shell test using an
unquoted glob ([ -f test_results/pipeline_info/execution_trace*.txt ]) can
expand to multiple args and fail when several trace files exist; replace that
check with a glob-safe probe such as using compgen -G
"test_results/pipeline_info/execution_trace*.txt" >/dev/null || true (or
enabling nullglob and checking for files), then run grep against the matched
files (execution_trace*.txt) as before and preserve the rest of the logic that
sets failed and iterates to copy logs in the "Gather failed logs" step so the
variable failed and the while-read loop continue to work correctly.
Fix CI: single-line commands, bundled test data
Summary by CodeRabbit