-
Notifications
You must be signed in to change notification settings - Fork 2
GTDB R226
Download started on the 14th of September 2024 LPSN Downloaded on the 18th of September Assembly summary for Fungi downloaded on the 21st of September
When running hmmsearch:
- GCA_002870245.1
- GCA_000722275.1
- GCA_000722025.1
- GCA_000722095.1
- GCA_000722035.1
- GCA_000722415.1
- GCA_000722115.1
- GCA_000715975.1
- GCA_000721965.1
- GCA_000716225.1
- GCA_000716135.1
- GCA_000716285.1
- GCA_001578415.1
- GCA_001341675.1 are skipped because the protein file _protein.faa is empty
we remove genomes from the genome_dirs file because they dont have a genomic_fna file.
- GCF_041410205.1
- GCF_033353635.1
- GCA_041877085.1
- Silva : 138.2
- LTP : 08_2023
When update the NCBI taxonomy for all the genomes in the database, 5 of them did not have a taxonomy, I did add them manually:
- GCF_025895605.1
- GCF_037414725.1
- GCF_037414705.1
- GCF_014656475.1
- GCF_037478275.1
With the new parsing of LPSN where a strain can be only one letter and doesnt need digit, some genomes have a parsing problem. This is the case of GCF_000306055.1.
GCF_000306055.1 has the ncbi organism name “Xanthomonas arboricola pv. juglandis str. NCPPB 1447”, the strain ID is “NCPPB1447 “ BUT , on the LPSN website, “pv. Juglandis” is considered a type strain.
I had to manually edit the record to flagged it as a non type strain.
There is 2 more columns in this release: ncbi_not_trusted_as_type ( previously ncbi_untrustworthy_type ) and ncbi_excluded_from_refseq. With the ncbi_excluded_from_refseq we run an extra steps. if 'derived from metagenome' and 'not used as type' are in the ncbi_excluded_from_refseq columns, we update gtdb_type_designation_ncbi_taxa to 'not used as type'