Skip to content

FIX: paginate results when fetching Kraken 2 DBs#346

Open
misialq wants to merge 4 commits into
bokulich-lab:mainfrom
misialq:fix-fetch-kraken-db
Open

FIX: paginate results when fetching Kraken 2 DBs#346
misialq wants to merge 4 commits into
bokulich-lab:mainfrom
misialq:fix-fetch-kraken-db

Conversation

@misialq

@misialq misialq commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Description

This PR fixes a bug by which the S3 results returned within the fetch-kraken-db action would be limited to the first 1000 keys. This would result in the action failing as the Kraken databases recently stopped being included in the first 1000 results. Instead, we should paginate the results, fetch all and search for the required collection through those.

To test simply run the action with any of the collection keys - the issue applied to all of those.

AI Disclosure

  • NO AI USED.
  • AI USED.

AI Usage Details

Gemini was used while updating tests and adding new ones. All the code was reviewed before PR submission.

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.62%. Comparing base (f105161) to head (59333c6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #346      +/-   ##
==========================================
+ Coverage   94.60%   94.62%   +0.02%     
==========================================
  Files          35       35              
  Lines        2669     2680      +11     
  Branches      397      399       +2     
==========================================
+ Hits         2525     2536      +11     
  Misses         72       72              
  Partials       72       72              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@misialq misialq marked this pull request as ready for review June 3, 2026 16:01
@misialq misialq requested a review from Copilot June 3, 2026 16:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a limitation in the Kraken2 prebuilt DB fetch logic where only the first 1000 S3 keys were considered, by switching the S3 listing request to ListObjectsV2-style pagination and aggregating results across pages before selecting the latest matching database.

Changes:

  • Implemented paginated S3 listing (continuation tokens) when fetching available Kraken2 DB artifacts.
  • Refactored _find_latest_db to operate on parsed S3 object metadata rather than a raw HTTP response.
  • Updated and expanded unit tests to cover pagination and edge cases in S3 XML parsing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
q2_annotate/kraken2/database.py Adds S3 list pagination and refactors latest-DB selection to work from accumulated S3 objects.
q2_annotate/kraken2/tests/test_database.py Updates tests for the refactor and adds coverage for paginated listings and edge-case XML responses.
Comments suppressed due to low confidence (1)

q2_annotate/kraken2/database.py:190

  • _find_latest_db will raise an unhandled IndexError when s3_objects is empty after filtering by the collection pattern (e.g., S3 returns objects but none match the expected DB naming convention). This will surface as a confusing crash for users; it should raise a clear ValueError (or similar) with an actionable message when no matching databases are found.
    s3_objects = [obj for obj in s3_objects if re.match(pattern, obj["Key"])]
    s3_objects = sorted(s3_objects, key=lambda x: x["LastModified"], reverse=True)

    latest_db = s3_objects[0]["Key"]
    return latest_db

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread q2_annotate/kraken2/tests/test_database.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants