-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello!
The latest version of cblaster seems to have an issue with the extract_clusters command where it does not pull the annotations for the proteins that are listed first and last in the output from running search.
-- Example --
Here is are the proteins that were found:
| Cluster 1 | score 6.3: | |||||
|---|---|---|---|---|---|---|
| Query | Subject | Identity | Coverage | E-value | Bitscore | Start |
| intermediate | WP_002526074.1 | - | - | - | - | 657707 |
| intermediate | WP_204158246.1 | - | - | - | - | 658253 |
| intermediate | WP_171015102.1 | - | - | - | - | 660183 |
| intermediate | WP_002522771.1 | - | - | - | - | 660448 |
| intermediate | WP_002522770.1 | - | - | - | - | 661923 |
| intermediate | HMPREF9575_RS13755 | - | - | - | - | 662199 662272 |
| intermediate | HMPREF9575_RS13480 | - | - | - | - | 662311 |
| response_regulator_receiver_domain_protein_translation | WP_002522768.1 | 100 | 100 | 3e-141 | 403 | 662707 |
| histidine_kinase_translation | WP_002522767.1 | 100 | 100 | 0 | 779 | 663312 |
| intermediate | WP_002526075.1 | - | - | - | - | 664515 |
| hypothetical_protein_translation | WP_002526076.1 | 99.8 | 100 | 0 | 1154 | 664760 |
| transporter _major_facilitator_family_protein_translation | WP_032501619.1 | 100 | 100 | 0 | 885 | 667259 |
| intermediate | WP_002522761.1 | - | - | - | - | 668730 |
| intermediate | WP_002522760.1 | - | - | - | - | 669860 |
| intermediate | WP_002522759.1 | - | - | - | - | 670528 |
| intermediate | WP_002526079.1 | - | - | - | - | 672873 |
And here is the gbk file that extract_clusters created which does not include the first or last gene listed in the search output (Note I left out the middle genes for brevity):
LOCUS NZ_GL383323.1 15998 bp DNA UNK 01-JAN-1980
DEFINITION Genes for cluster 1 on scaffold NZ_GL383323.1 of species
Cutibacterium acnes HL110PA1.
ACCESSION NZ_GL383323
VERSION NZ_GL383323.1
FEATURES Location/Qualifiers
CDS complement(547..2478)
/protein_id="WP_204158246.1"
/translation="MTWVQASFWRSQDQINDTRDLSDLLASPATVPVGMRWYREPNEAS
IFNITDPEANQTFKPGDSGSFTVTGTPAQMGLASPNAVDAIGIHVQASPENQSRRTVGR
ARVLTVLSDAHTSANLAPVIVLSTMPTRRIDGTFTDESLADDITHRLKPLAEAAHTRNA
TVLVDPSLIDEARAMASGYRVAGKGTATVEGKGQQTAREWLDLVDPLLTTGQAYRLPYG
NADVIGAVRQGRPNVLLTVKHALDPSNPAAKLPLAVVDPSAELDRSSFKTLTKELSPAL
VLTCAASARDGVRGESGGKGIGLADTARTDGHPQSNSDPQRRGMLLSQALLMTHESIPA
VTLVTTVNDVQATAPVGWLHLQNLSAVLTGAKPGLRLPGTRAGDITLKGPWWRVQHDVG
IDSDDWSDLVGAPTEATSLTSAKFVSRSLSSSLQDREAWATDVMRPAADAMAGKGLVLH
SAPQFVMSSSTNDFPLTVTNSLAQTVHVKVVVFSENPQRIDIPDTQVVTIQPRETQTIR
FAPKASSNGVIEMQAHLSTPSGRSLGSQTSFVVKATQMDDVGWIIIVVSALVLIIATVL
RIRQVTASSRRQAESNGEPQTSGPTAGSTSDNISDTTPSPSAVEDPDTASDDDSEHHLP
TGEGNLAE"
/cluster_role="intermediate"
CDS 2477..2722
/protein_id="WP_171015102.1"
/translation="MDTEEVFVTVPLMVMGVLGSPVGGLTEVIVTLTNSAAVRAWAGED
PAKTVVRTINNPSSQAAGQRREAGDTFVMVVGRGRD"
/cluster_role="intermediate"
~~ADDITIONAL GENES IN HERE~~
CDS complement(12154..12825)
/protein_id="WP_002522760.1"
/translation="MTTTLLPRVAQPGPGVSHVDTDRPIECIITHHPHMPRDRGGHATS
VSGRYLLRQAAELMLGTDPAHCPVVDPSRRWYWPGINLHGSVSHVPGWSLTTLSTGGHI
GADIQDFRERPGAMAFIGDLVKLSRSASLREFAECEAVVKVSELTKETFGHVRLPEWTP
GWRHVFEDYWVWSLEMHGMGVIALASDLPRAIRWWRCDADARGRLQALRPISSLGPGRP
S"
/cluster_role="intermediate"
CDS complement(12822..15161)
/protein_id="WP_002522759.1"
/translation="MTITAENATTRSDIARSIAVTGVGLVTAQGDHTDECWTELVDGVC
GITMNVTFDDSGTTIPCAGVAPIPNSDSIDRCYLLGVHAMREALEMSGIDLDSVGRDRI
GLVVGSSLGAMPTLEAAHRRAIETGVLDAGLAADSQLHCVADHLAAEFDIRGPRVVTSN
ACAAGAVAIGYAAELLWSDDVDLVVCGGVDPLAQISANGFTCLGALDNLPCSPMAGSSG
LTLGEGAGFMVLERTDAAAARGQEVMAEIAGYGTSCDGYHQTAPDPGGNGARSSMEAAL
RSAHLKPSDVSYVNLHGTGTPTNDAVEPKALRSLFKSDDLPPVSSVKGAIGHTLGAAGA
IEAVCSIKAIHEGVLPPTVNNRGQASRTGLDIVPECARKAAPDVVISNSFAFGGNNASV
VITAPRGGVHCTAPAQLREVGISGMAALAGKAANSEELLSALSEDCPIWMADEKTWEGD
AVQTGHVDIKRLSRTINPSKVRRMDPLGIISSAVVTDLYARHGKLSRKDAESTGIIFAT
GYGPVTAVTQFNDGIIRHGSEGANALVFPNTVVNAAAGHLAMLNRYRGYTATLACGGTS
SLMALLLAARVVGRGAADRIMVVIADEFPSIAVQAVAKLPGYRHRVDGSGAVLSEGAVC
VLVEAVEVAEARGTAPMALLRGFGSRGESVGVGHTASDGRAWAKAMAAALGPAGLTASD
VSTVVAASSGHPRVDRAEQAARRIVGLSATATTFPKAIVGETHGSAAGIGLFGALCGSR
SAAHQNILVNAFSHGGGYASMVVESL"
/cluster_role="intermediate"
ORIGIN.....
It looks like extract_clusters is pulling the full DNA sequences but not the annotations which you can see in the gbk file as neither of the genes listed go to the end of the sequence. This is also evident in the clinker images as none of the clusters end with arrows but rather all end with sticks of unannotated sequences.
I looked back at some older cblaster results from previous versions and this isn't how the software used to behave. I didn't see any changes indicated what would have led to this change. Can this be fixed?
Let me know if there is anything you need from me that would help!
Thanks!