Is your feature request related to a problem? Please describe.
This is semi-related to #197 - when we deprecate the fetching action it would be great to add support for building custom Kaiju databases.
Describe the solution you'd like
This action should make use of the kaiju-mkbwt and kaiju-mkfmi utilities to construct the database, similarly to what the build-kraken-db action does.
Acceptance criteria:
- the action accepts
GenomeData[Proteins] (corresponding to the ProteinsDirectoryFormat: https://github.com/qiime2/q2-types/blob/2a839b49650dc44d5dd9e8983bdff93d06754bfc/q2_types/genome_data/_formats.py#L31) as input together with a metadata file mapping genome IDs in the GenomeData artifact to NCBI taxon IDs
- the action fetches the NCBI taxonomy from the NCBI FTP server (we will need the nodes.dmp and names.dmp files)
- the action first runs kaiju-mkbwt using all the protein inputs and passes the output to the kaiju-mkfmi script
- the action then stores the fmi index generated by kaiju-mkfmi together with the nodes.dmp and names.dmp as the final Kaiju DB output
Additional context
Kaiju repo
Is your feature request related to a problem? Please describe.
This is semi-related to #197 - when we deprecate the fetching action it would be great to add support for building custom Kaiju databases.
Describe the solution you'd like
This action should make use of the
kaiju-mkbwtandkaiju-mkfmiutilities to construct the database, similarly to what thebuild-kraken-dbaction does.Acceptance criteria:
GenomeData[Proteins](corresponding to the ProteinsDirectoryFormat: https://github.com/qiime2/q2-types/blob/2a839b49650dc44d5dd9e8983bdff93d06754bfc/q2_types/genome_data/_formats.py#L31) as input together with a metadata file mapping genome IDs in the GenomeData artifact to NCBI taxon IDsAdditional context
Kaiju repo