A high-performance, multi-threaded SRA downloader that outsmarts NCBI's API to guarantee pristine, full-fidelity Phred scores.
TurboSRA is a production-ready bash utility engineered to fix the two biggest bottlenecks in bioinformatics data acquisition: slow, single-threaded SRA downloads and unpredictable data degradation. By intelligently routing through the European Nucleotide Archive (ENA), TurboSRA guarantees that you receive the original, high-fidelity datasets required for sensitive variant callingβat more than double the speed of standard tools.
- Smart API Routing: Automatically prioritizes ENA servers to bypass NCBI's
.sraliteformats, ensuring base quality scores are never stripped or averaged. - Parallel Downloading: Utilizes
aria2cto open 16 simultaneous HTTP connections per accession, maxing out your bandwidth. - Multi-Threaded Compression: Leverages
pigzto compress output FASTQ files across all available CPU cores. - Cross-Platform Auto-Scaling: Natively detects and utilizes maximum CPU cores on Linux, WSL, and macOS (Apple Silicon).
The standard NCBI sra-tools (prefetch + fasterq-dump) processes data sequentially using a single thread. Worse, during heavy server loads, NCBI silently serves .sralite formats, permanently destroying the original Phred quality scores to save bandwidth.
Test parameters: 10-core Apple Silicon (M-Series), 1Gbps connection, Single Accession (SRR1274307).
| Metric | Standard prefetch |
TurboSRA v1.5 |
|---|---|---|
| Total Time | 9.88 seconds | 4.63 seconds (2.1x Faster) |
| Download Engine | Single HTTP connection | aria2c (16 parallel connections) |
| Extraction & Compression | Sequential & Single-threaded | Piped fasterq-dump & Multi-threaded pigz |
| Data Integrity | .sralite risk) |
π‘οΈ Guaranteed Full Phred Scores |
If you request an accession using standard tools, you risk receiving simplified quality scores (a solid wall of ?), making the data useless for downstream variant callers like GATK or Snippy.
Output from standard prefetch (SRA Lite Fallback):
@SRR1274307.1 1 length=25
ATGGCTCACTGCAGCCTTGACTTTC
+SRR1274307.1 1 length=25
????????????????????????? <-- Quality scores destroyed
Output from TurboSRA (Strict ENA Mode):
@SRR1274307.1 1 length=25
ATGGCTCACTGCAGCCTTGACTTTC
+SRR1274307.1 1 length=25
AAA?ABBBDEEDDDDEGGGFGGIIH <-- Original Phred scores preserved
βοΈ Installation & Dependencies
TurboSRA requires standard POSIX utilities alongside a few core bioinformatics tools.
1. Install Dependencies
For macOS (Apple Silicon / Intel):
# Install SRA-Tools via Conda/Mamba
mamba install -c conda-forge -c bioconda sra-tools -y
# Install system utilities via Homebrew
brew install aria2 pigz curl
For Linux / WSL (Ubuntu/Debian):
sudo apt update
sudo apt install sra-toolkit aria2 pigz curl -y
2. Download TurboSRA
git clone [https://github.com/hemant-goyal/TurboSRA.git](https://github.com/hemant-goyal/TurboSRA.git)
cd TurboSRA
chmod +x turbo_srav1.5.sh
π Usage Provide a plain text file containing only one SRA accession per line.
./turbo_srav1.5.sh -i accessions.txt [OPTIONS]
Flags [OPTIONS]
| Flag | Description | Default |
|---|---|---|
| -i | (Required) Input file containing list of SRR accessions | None |
| -o | Output directory | ./ (Current directory) |
| -c | Enable multi-threaded FASTQ compression (.fastq.gz) | False |
| -l | "Allow ""LITE"" mode. Skips ENA strict routing and allows NCBI's .sralite format for maximum speed (Warning: destroys quality scores)" | False (Strict FULL mode) |
| -k | Keep the intermediate .sra cache files | False (Auto-cleans) |
| -t | Manually specify the number of CPU threads | Auto-detected |