extract_clusters skipping annotation for first and last proteins

Hello!
The latest version of cblaster seems to have an issue with the extract_clusters command where it does not pull the annotations for the proteins that are listed first and last in the output from running search.

-- Example --
Here is are the proteins that were found:

Cluster 1 | score   6.3: |   |   |   |   |  
-- | -- | -- | -- | -- | -- | --
Query                            |                              Subject       |          Identity  |  Coverage |   E-value  |    Bitscore  |  Start   |  End    |     Strand | 
intermediate                   |                                WP_002526074.1    |    - |         -     |      -   |       -        |   657707 |  658186   |   - | 
intermediate                    |                               WP_204158246.1    |    -     |      -  |         -   |       -       |    658253 |   660185  |    - | 
intermediate                   |                                WP_171015102.1   |     -       |    -    |       -     |     -     |      660183   | 660429    |  + | 
intermediate                    |                               WP_002522771.1    |    -      |     -    |       -      |    -        |   660448   | 661927  |    - | 
intermediate                    |                               WP_002522770.1 |       -     |      -     |      -     |     -      |     661923  |  662139   |   - | 
intermediate                       |                            HMPREF9575_RS13755 |   -     |      -      |     -     |     -        |   662199  662272  |    + | 
intermediate                          |                         HMPREF9575_RS13480  |  -       |    -        |   -      |    -     |      662311   | 662680    |  + | 
response_regulator_receiver_domain_protein_translation   |    WP_002522768.1  |      100       |  100      |   3e-141    | 403    |       662707   | 663316  |  - | 
histidine_kinase_translation          |                         WP_002522767.1    |    100 |        100  |       0        |  779  |       663312  |    664482  |  - | 
intermediate                       |                            WP_002526075.1   |     -     |      -      |     -      |    -        |   664515   | 664647    |  - | 
hypothetical_protein_translation       |                        WP_002526076.1    |    99.8     |   100        | 0   |       1154     |   664760  |    666482  |  - | 
transporter _major_facilitator_family_protein_translation   | WP_032501619.1     |   100     |    100     |    0        |  885      |   667259    |  668705  |  - | 
intermediate                       |                            WP_002522761.1  |      -     |      -   |        -    |      -   |        668730  |  669864   |   - | 
intermediate                      |                             WP_002522760.1    |    -      |     -     |      -  |        -       |    669860  |  670532    |  - | 
intermediate                        |                           WP_002522759.1    |    -   |        -      |     -   |       -   |        670528   | 672868   |   - | 
intermediate                       |                            WP_002526079.1    |    -      |     -       |    -    |      -       |    672873  |  673705    |  - | 



And here is the gbk file that extract_clusters created which does not include the first or last gene listed in the search output (Note I left out the middle genes for brevity):

```

LOCUS       NZ_GL383323.1          15998 bp    DNA              UNK 01-JAN-1980
DEFINITION  Genes for cluster 1 on scaffold NZ_GL383323.1 of species
            Cutibacterium acnes HL110PA1.
ACCESSION   NZ_GL383323
VERSION     NZ_GL383323.1

FEATURES             Location/Qualifiers
     CDS             complement(547..2478)
                     /protein_id="WP_204158246.1"
                     /translation="MTWVQASFWRSQDQINDTRDLSDLLASPATVPVGMRWYREPNEAS
                     IFNITDPEANQTFKPGDSGSFTVTGTPAQMGLASPNAVDAIGIHVQASPENQSRRTVGR
                     ARVLTVLSDAHTSANLAPVIVLSTMPTRRIDGTFTDESLADDITHRLKPLAEAAHTRNA
                     TVLVDPSLIDEARAMASGYRVAGKGTATVEGKGQQTAREWLDLVDPLLTTGQAYRLPYG
                     NADVIGAVRQGRPNVLLTVKHALDPSNPAAKLPLAVVDPSAELDRSSFKTLTKELSPAL
                     VLTCAASARDGVRGESGGKGIGLADTARTDGHPQSNSDPQRRGMLLSQALLMTHESIPA
                     VTLVTTVNDVQATAPVGWLHLQNLSAVLTGAKPGLRLPGTRAGDITLKGPWWRVQHDVG
                     IDSDDWSDLVGAPTEATSLTSAKFVSRSLSSSLQDREAWATDVMRPAADAMAGKGLVLH
                     SAPQFVMSSSTNDFPLTVTNSLAQTVHVKVVVFSENPQRIDIPDTQVVTIQPRETQTIR
                     FAPKASSNGVIEMQAHLSTPSGRSLGSQTSFVVKATQMDDVGWIIIVVSALVLIIATVL
                     RIRQVTASSRRQAESNGEPQTSGPTAGSTSDNISDTTPSPSAVEDPDTASDDDSEHHLP
                     TGEGNLAE"
                     /cluster_role="intermediate"
     CDS             2477..2722
                     /protein_id="WP_171015102.1"
                     /translation="MDTEEVFVTVPLMVMGVLGSPVGGLTEVIVTLTNSAAVRAWAGED
                     PAKTVVRTINNPSSQAAGQRREAGDTFVMVVGRGRD"
                     /cluster_role="intermediate"
     ~~ADDITIONAL GENES IN HERE~~
     CDS             complement(12154..12825)
                     /protein_id="WP_002522760.1"
                     /translation="MTTTLLPRVAQPGPGVSHVDTDRPIECIITHHPHMPRDRGGHATS
                     VSGRYLLRQAAELMLGTDPAHCPVVDPSRRWYWPGINLHGSVSHVPGWSLTTLSTGGHI
                     GADIQDFRERPGAMAFIGDLVKLSRSASLREFAECEAVVKVSELTKETFGHVRLPEWTP
                     GWRHVFEDYWVWSLEMHGMGVIALASDLPRAIRWWRCDADARGRLQALRPISSLGPGRP
                     S"
                     /cluster_role="intermediate"
     CDS             complement(12822..15161)
                     /protein_id="WP_002522759.1"
                     /translation="MTITAENATTRSDIARSIAVTGVGLVTAQGDHTDECWTELVDGVC
                     GITMNVTFDDSGTTIPCAGVAPIPNSDSIDRCYLLGVHAMREALEMSGIDLDSVGRDRI
                     GLVVGSSLGAMPTLEAAHRRAIETGVLDAGLAADSQLHCVADHLAAEFDIRGPRVVTSN
                     ACAAGAVAIGYAAELLWSDDVDLVVCGGVDPLAQISANGFTCLGALDNLPCSPMAGSSG
                     LTLGEGAGFMVLERTDAAAARGQEVMAEIAGYGTSCDGYHQTAPDPGGNGARSSMEAAL
                     RSAHLKPSDVSYVNLHGTGTPTNDAVEPKALRSLFKSDDLPPVSSVKGAIGHTLGAAGA
                     IEAVCSIKAIHEGVLPPTVNNRGQASRTGLDIVPECARKAAPDVVISNSFAFGGNNASV
                     VITAPRGGVHCTAPAQLREVGISGMAALAGKAANSEELLSALSEDCPIWMADEKTWEGD
                     AVQTGHVDIKRLSRTINPSKVRRMDPLGIISSAVVTDLYARHGKLSRKDAESTGIIFAT
                     GYGPVTAVTQFNDGIIRHGSEGANALVFPNTVVNAAAGHLAMLNRYRGYTATLACGGTS
                     SLMALLLAARVVGRGAADRIMVVIADEFPSIAVQAVAKLPGYRHRVDGSGAVLSEGAVC
                     VLVEAVEVAEARGTAPMALLRGFGSRGESVGVGHTASDGRAWAKAMAAALGPAGLTASD
                     VSTVVAASSGHPRVDRAEQAARRIVGLSATATTFPKAIVGETHGSAAGIGLFGALCGSR
                     SAAHQNILVNAFSHGGGYASMVVESL"
                     /cluster_role="intermediate"
ORIGIN.....

```
It looks like extract_clusters is pulling the full DNA sequences but not the annotations which you can see in the gbk file as neither of the genes listed go to the end of the sequence. This is also evident in the clinker images as none of the clusters end with arrows but rather all end with sticks of unannotated sequences.


I looked back at some older cblaster results from previous versions and this isn't how the software used to behave. I didn't see any changes indicated what would have led to this change. Can this be fixed? 
Let me know if there is anything you need from me that would help!
Thanks!



















































```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract_clusters skipping annotation for first and last proteins #123

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cluster 1	score 6.3:
Query	Subject	Identity	Coverage	E-value	Bitscore	Start
intermediate	WP_002526074.1	-	-	-	-	657707
intermediate	WP_204158246.1	-	-	-	-	658253
intermediate	WP_171015102.1	-	-	-	-	660183
intermediate	WP_002522771.1	-	-	-	-	660448
intermediate	WP_002522770.1	-	-	-	-	661923
intermediate	HMPREF9575_RS13755	-	-	-	-	662199 662272
intermediate	HMPREF9575_RS13480	-	-	-	-	662311
response_regulator_receiver_domain_protein_translation	WP_002522768.1	100	100	3e-141	403	662707
histidine_kinase_translation	WP_002522767.1	100	100	0	779	663312
intermediate	WP_002526075.1	-	-	-	-	664515
hypothetical_protein_translation	WP_002526076.1	99.8	100	0	1154	664760
transporter _major_facilitator_family_protein_translation	WP_032501619.1	100	100	0	885	667259
intermediate	WP_002522761.1	-	-	-	-	668730
intermediate	WP_002522760.1	-	-	-	-	669860
intermediate	WP_002522759.1	-	-	-	-	670528
intermediate	WP_002526079.1	-	-	-	-	672873

extract_clusters skipping annotation for first and last proteins #123

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions