Skip to content

Broaden the definition of complete choroplasts #23

@greatfireball

Description

@greatfireball

The definition of a complete chloroplast comprises the following requirements:

  1. The subgraph need to have between MINNODES (3) and MAXNODES (100)
    next if (@{$wcc} < $MINNODES || @{$wcc} > $MAXNODES);
  2. Need to be a cyclic subgraph with a total sequence length between MINSEQLEN (25 kbp) and MAXSEQLEN (1 Mbp)
    next unless ($c->is_cyclic && $seqlen >= $MINSEQLEN && $seqlen <= $MAXSEQLEN);
  3. Subgraph need to have at least one blast hit against the reference database

    fastg-parser/import.pl

    Lines 233 to 239 in be99085

    my $output = qx(tblastx -db $blastdbfile -query $filename -evalue 1e-10 -outfmt 6 -num_alignments 1 -num_threads 4);
    if (length($output) > 0)
    {
    $L->debug("Found hits for cyclic graph: ".$c);
    push(@cyclic_contigs_with_blast_hits, $c);
    }
  4. Only one subgraph having blast hits is allowed
    if (@cyclic_contigs_with_blast_hits == 1)
  5. The node with the highest connectivity is assigned as IR
    my $inverted_repeat = "$degree[0]{v}";
  6. After removing the IR nodes, only two other nodes are allowed
    if (keys %nodes == 2)
  7. LSC and SSC are simply assigned by sequence length

    fastg-parser/import.pl

    Lines 286 to 289 in be99085

    if (length($seq[$lsc]) < length($seq[$ssc]))
    {
    ($lsc, $ssc) = ($ssc, $lsc);
    }

I think we can improve our detection by avoiding some of those requirements, eg. 6.

Any ideas are welcome!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions