FIX: paginate results when fetching Kraken 2 DBs#346
Open
misialq wants to merge 4 commits into
Open
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #346 +/- ##
==========================================
+ Coverage 94.60% 94.62% +0.02%
==========================================
Files 35 35
Lines 2669 2680 +11
Branches 397 399 +2
==========================================
+ Hits 2525 2536 +11
Misses 72 72
Partials 72 72 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a limitation in the Kraken2 prebuilt DB fetch logic where only the first 1000 S3 keys were considered, by switching the S3 listing request to ListObjectsV2-style pagination and aggregating results across pages before selecting the latest matching database.
Changes:
- Implemented paginated S3 listing (continuation tokens) when fetching available Kraken2 DB artifacts.
- Refactored
_find_latest_dbto operate on parsed S3 object metadata rather than a raw HTTP response. - Updated and expanded unit tests to cover pagination and edge cases in S3 XML parsing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
q2_annotate/kraken2/database.py |
Adds S3 list pagination and refactors latest-DB selection to work from accumulated S3 objects. |
q2_annotate/kraken2/tests/test_database.py |
Updates tests for the refactor and adds coverage for paginated listings and edge-case XML responses. |
Comments suppressed due to low confidence (1)
q2_annotate/kraken2/database.py:190
_find_latest_dbwill raise an unhandledIndexErrorwhens3_objectsis empty after filtering by the collection pattern (e.g., S3 returns objects but none match the expected DB naming convention). This will surface as a confusing crash for users; it should raise a clearValueError(or similar) with an actionable message when no matching databases are found.
s3_objects = [obj for obj in s3_objects if re.match(pattern, obj["Key"])]
s3_objects = sorted(s3_objects, key=lambda x: x["LastModified"], reverse=True)
latest_db = s3_objects[0]["Key"]
return latest_db
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a bug by which the S3 results returned within the
fetch-kraken-dbaction would be limited to the first 1000 keys. This would result in the action failing as the Kraken databases recently stopped being included in the first 1000 results. Instead, we should paginate the results, fetch all and search for the required collection through those.To test simply run the action with any of the collection keys - the issue applied to all of those.
AI Disclosure
AI Usage Details
Gemini was used while updating tests and adding new ones. All the code was reviewed before PR submission.