Memory consumption is too high

The program should consume reasonable memory for a typical desktop.
In theory, the amount of memory required by the program should be a little bit higher (the size of lc-hash auxiliary data structure, 128MB when the hash length is 12) than BWA.

Current status: the program requires more than 200GB of memory to align 400GB human genome reads.

The possible reasons for this issue:
* The auxiliary data structures are taking too much memory. The suffix array should be compressed and computed using LF mapping instead of storing the whole suffix array. However, using LF mapping instead of full suffix array may result in worse performance during alignment, but we are not sure how much impact on the running time.
* The human genome sequences are packed into a ".cat" file by removing everything that is not a sequence. Current code (v0.9) is also packing the reverse complementary inside the ".cat" file. If we remove the reverse complementary part in the ".cat" file, we may save 15GB of memory and disk space. However, it will result in worse performance by 50%.
* The reads are processed in "batches" in order to consume a constant amount of memory. But we may have memory leaks in the code which results in increasing memory consumption during alignment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory consumption is too high #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory consumption is too high #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions