🚀 TurboSRA (v1.5)

A high-performance, multi-threaded SRA downloader that outsmarts NCBI's API to guarantee pristine, full-fidelity Phred scores.

🧬 Stop letting NCBI silently destroy your quality scores.

TurboSRA is a production-ready bash utility engineered to fix the two biggest bottlenecks in bioinformatics data acquisition: slow, single-threaded SRA downloads and unpredictable data degradation. By intelligently routing through the European Nucleotide Archive (ENA), TurboSRA guarantees that you receive the original, high-fidelity datasets required for sensitive variant calling—at more than double the speed of standard tools.

✨ Key Features

Smart API Routing: Automatically prioritizes ENA servers to bypass NCBI's .sralite formats, ensuring base quality scores are never stripped or averaged.
Parallel Downloading: Utilizes aria2c to open 16 simultaneous HTTP connections per accession, maxing out your bandwidth.
Multi-Threaded Compression: Leverages pigz to compress output FASTQ files across all available CPU cores.
Cross-Platform Auto-Scaling: Natively detects and utilizes maximum CPU cores on Linux, WSL, and macOS (Apple Silicon).

📊 The Benchmark: Why use TurboSRA?

The standard NCBI sra-tools (prefetch + fasterq-dump) processes data sequentially using a single thread. Worse, during heavy server loads, NCBI silently serves .sralite formats, permanently destroying the original Phred quality scores to save bandwidth.

Test parameters: 10-core Apple Silicon (M-Series), 1Gbps connection, Single Accession (SRR1274307).

⏱️ The Speed Showdown

Metric	Standard `prefetch`	TurboSRA `v1.5`
Total Time	9.88 seconds	4.63 seconds (2.1x Faster)
Download Engine	Single HTTP connection	`aria2c` (16 parallel connections)
Extraction & Compression	Sequential & Single-threaded	Piped `fasterq-dump` & Multi-threaded `pigz`
Data Integrity	⚠️ Unpredictable (`.sralite` risk)	🛡️ Guaranteed Full Phred Scores

🔍 The Data Integrity Proof

If you request an accession using standard tools, you risk receiving simplified quality scores (a solid wall of ?), making the data useless for downstream variant callers like GATK or Snippy.

Output from standard prefetch (SRA Lite Fallback):

@SRR1274307.1 1 length=25
ATGGCTCACTGCAGCCTTGACTTTC
+SRR1274307.1 1 length=25
?????????????????????????  <-- Quality scores destroyed

Output from TurboSRA (Strict ENA Mode):

@SRR1274307.1 1 length=25
ATGGCTCACTGCAGCCTTGACTTTC
+SRR1274307.1 1 length=25
AAA?ABBBDEEDDDDEGGGFGGIIH  <-- Original Phred scores preserved

⚙️ Installation & Dependencies

TurboSRA requires standard POSIX utilities alongside a few core bioinformatics tools.

1. Install Dependencies

For macOS (Apple Silicon / Intel):

# Install SRA-Tools via Conda/Mamba
mamba install -c conda-forge -c bioconda sra-tools -y
# Install system utilities via Homebrew
brew install aria2 pigz curl

For Linux / WSL (Ubuntu/Debian):

sudo apt update
sudo apt install sra-toolkit aria2 pigz curl -y

2. Download TurboSRA

git clone [https://github.com/hemant-goyal/TurboSRA.git](https://github.com/hemant-goyal/TurboSRA.git)
cd TurboSRA
chmod +x turbo_srav1.5.sh

🚀 Usage Provide a plain text file containing only one SRA accession per line.

./turbo_srav1.5.sh -i accessions.txt [OPTIONS]

Flags [OPTIONS]

Flag	Description	Default
-i	(Required) Input file containing list of SRR accessions	None
-o	Output directory	./ (Current directory)
-c	Enable multi-threaded FASTQ compression (.fastq.gz)	False
-l	"Allow ""LITE"" mode. Skips ENA strict routing and allows NCBI's .sralite format for maximum speed (Warning: destroys quality scores)"	False (Strict FULL mode)
-k	Keep the intermediate .sra cache files	False (Auto-cleans)
-t	Manually specify the number of CPU threads	Auto-detected

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
turbo_srav1.5.sh		turbo_srav1.5.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 TurboSRA (v1.5)

🧬 Stop letting NCBI silently destroy your quality scores.

✨ Key Features

📊 The Benchmark: Why use TurboSRA?

⏱️ The Speed Showdown

🔍 The Data Integrity Proof

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 TurboSRA (v1.5)

🧬 Stop letting NCBI silently destroy your quality scores.

✨ Key Features

📊 The Benchmark: Why use TurboSRA?

⏱️ The Speed Showdown

🔍 The Data Integrity Proof

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages