Annotation transfer by ZuzanaSebb · Pull Request #349 · bokulich-lab/q2-annotate

ZuzanaSebb · 2026-06-08T06:50:17Z

Description

Add transfer_eggnog_annotations action.

What it does

MAG-level annotations + FeatureData[MAG] → copy files for matching MAG IDs (after dereplication).
Contig-level annotations + FeatureMap[MAGtoContigs] → aggregate into per-MAG files by mapping each gene to its contig.

AI Disclosure

NO AI USED.
AI USED.

AI Usage Details

Claude (Sonnet 4.6) was used to draft the initial implementation; I refined the methods and the final action.
Claude (Opus 4.8) was used to draft the initial implementation of the unit tests.

misialq

Hey @ZuzanaSebb, here's my first round of comments - once you update these I'll take your code for a spin :D

Also, just a reminder, in all the plugin repos in our organization we are trying to follow the PR naming scheme same as commit naming (use a prefix and a semi-verbose description, typically using a verb in imperative form).

misialq · 2026-06-09T09:14:16Z

    return result
+
+
+def _get_mag_ids_from_feature_data(mags: MAGSequencesDirFmt) -> set:


Do we really need this method? It literally does a single thing.

misialq · 2026-06-09T09:15:50Z

+def _copy_annotation_files(
+    source_annotations: OrthologAnnotationDirFmt,
+    mag_ids: set,
+    result: OrthologAnnotationDirFmt,


You should just make this method return results as nothing really happens to them before they are being passed to this method. This also makes it a bit more explicit what this method actually does/returns.

misialq · 2026-06-09T09:19:31Z

+    matched_ids = mag_ids & set(annotation_dict.keys())
+    if not matched_ids:
+        raise ValueError("No annotation files matched the destination MAG IDs.")


I think this check should happen already before we call this method - it will remove the need to pass the mag_ids and provide a simpler interface. Actually, you could even move it to a separate method as you have some additional check below - that way the validation can be taken care of one testable method and the copying by another one, making both of them responsible for different parts of the pipeline.

misialq · 2026-06-09T09:24:38Z

+
+def transfer_eggnog_annotations(
+    ortholog_annotations: OrthologAnnotationDirFmt,
+    destination: Union[MAGSequencesDirFmt, MAGtoContigsDirFmt],


This is confusing. The destination should be represented by the same kind of semantic type, either SampleData[MAGs] or FeatureData[MAG]. Now, you are mixing in the contig map. I think this should become an additional input required when SampleData[MAGs] were provided as source of the annotations or if FeatureData[MAG] was provided as the destination (whichever of those makes more sense for your pipeline).

misialq · 2026-06-09T09:26:55Z

+        df = pd.read_csv(fp, sep="\t", skiprows=4)
+        # drop trailing comment only if present
+        first_col = df.columns[0]
+        df = df[~df[first_col].astype(str).str.startswith("##")]


I'm wondering whether you could achieve the same by simply reading in every file as OrthologFileFmt and viewing as a df?

misialq · 2026-06-09T09:28:10Z

+        df = df[~df[first_col].astype(str).str.startswith("##")]
+        frames.append(df)
+
+    all_annotations = pd.concat(frames, ignore_index=True)


We need to evaluate carefully whether this will work well when one has hundreds of samples with thousands of annotations. I'm a bit worried that the memory will blow up 😅

ZuzanaSebb and others added 5 commits June 1, 2026 15:00

ENH: transfer annotations initial commit

516f2f4

REFACTOR: refactoring the annotation transfer from cintigs to mags

0844bcf

FIX: polishing

a62d5d8

Merge branch 'bokulich-lab:main' into annotation-transfer

0f06c50

ADD: verbose prints

6e35410

ZuzanaSebb requested a review from misialq June 8, 2026 06:50

ZuzanaSebb linked an issue Jun 8, 2026 that may be closed by this pull request

ENH: add an action to transfer functional annotations #341

Open

REFACT: unit tests refactoring

fe16f49

ZuzanaSebb force-pushed the annotation-transfer branch from 63104d1 to fe16f49 Compare June 8, 2026 09:32

ebolyen assigned ZuzanaSebb Jun 8, 2026

FIX: unit tests clean up

934aa06

ZuzanaSebb force-pushed the annotation-transfer branch from 0d0b946 to 934aa06 Compare June 8, 2026 12:29

misialq requested changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation transfer#349

Annotation transfer#349
ZuzanaSebb wants to merge 7 commits into
bokulich-lab:mainfrom
ZuzanaSebb:annotation-transfer

ZuzanaSebb commented Jun 8, 2026

Uh oh!

misialq left a comment

Uh oh!

misialq Jun 9, 2026

Uh oh!

misialq Jun 9, 2026

Uh oh!

misialq Jun 9, 2026

Uh oh!

misialq Jun 9, 2026

Uh oh!

misialq Jun 9, 2026

Uh oh!

misialq Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return result


		def _get_mag_ids_from_feature_data(mags: MAGSequencesDirFmt) -> set:

Conversation

ZuzanaSebb commented Jun 8, 2026

Description

What it does

AI Disclosure

AI Usage Details

Uh oh!

misialq left a comment

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

misialq Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants