-
Notifications
You must be signed in to change notification settings - Fork 0
Memory consumption is too high #3
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The program should consume reasonable memory for a typical desktop.
In theory, the amount of memory required by the program should be a little bit higher (the size of lc-hash auxiliary data structure, 128MB when the hash length is 12) than BWA.
Current status: the program requires more than 200GB of memory to align 400GB human genome reads.
The possible reasons for this issue:
- The auxiliary data structures are taking too much memory. The suffix array should be compressed and computed using LF mapping instead of storing the whole suffix array. However, using LF mapping instead of full suffix array may result in worse performance during alignment, but we are not sure how much impact on the running time.
- The human genome sequences are packed into a ".cat" file by removing everything that is not a sequence. Current code (v0.9) is also packing the reverse complementary inside the ".cat" file. If we remove the reverse complementary part in the ".cat" file, we may save 15GB of memory and disk space. However, it will result in worse performance by 50%.
- The reads are processed in "batches" in order to consume a constant amount of memory. But we may have memory leaks in the code which results in increasing memory consumption during alignment.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request