pgr is a command-line toolkit for working with genomes and genome-derived
data: sequences, alignments, variation, phylogenies, and related formats.
Current release: 0.1.0
cargo install --path . --force #--offline
# test
cargo test -- --test-threads=1After installation, the pgr binary should be available in your PATH:
pgr help
pgr fa --help
pgr fas --help`pgr` - Practical Genome Refiner
Usage: pgr [COMMAND]
Commands:
ms Hudson's ms simulator tools
axt Manipulate AXT alignment files
chain Manipulate Chain alignment files
chaining Chaining alignment blocks
clust Clustering operations
dist Distance/Similarity metrics
lav Manipulate LAV alignment files
maf Manipulate MAF alignment files
mat Matrix operations
net Manipulate Net alignment files
psl Manipulate PSL alignment files
pl Run integrated pipelines
2bit Manage 2bit files
fa Manipulate FASTA files
fas Manipulate block FA files
fq Manipulate FASTQ files
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Subcommand groups:
* Simulation:
* ms - Hudson's ms simulator tools: to-dna
* Sequences:
* 2bit - 2bit query and extraction
* fa - FASTA operations: info, records, transform, indexing
* fas - Block FA operations: info, subset, transform, file, variation
* fq - FASTQ interleaving and conversion
* Genome alignments:
* chaining - Chaining alignments: psl
* chain - Chain operations: sort, filter, transform, to-net
* net - Net operations: info, subset, transform, convert
* axt - AXT sorting and conversion
* lav - Convert to PSL
* maf - Convert to Block FA
* psl - PSL statistics, manipulation, and conversion
* Clustering:
* clust - Algorithms: cc, dbscan, k-medoids, mcl
* Distance:
* dist - Metrics: hv
* Matrix:
* mat - Processing: compare, format, subset, to-pair, to-phylip
* Pipelines:
* pl - Workflows: p2m, trf, ir, rept, ucsc
This repository contains many subcommands and end-to-end workflows. Extended and curated examples are collected in:
- docs/usage_examples.md
Below are a few quick examples to get started:
# Basic FASTA statistics
pgr fa size tests/fasta/ufasta.fa
# Block FA summary
pgr fas stat tests/fas/example.fas --outgroup
# 2bit range extraction
pgr 2bit range tests/genome/mg1655.2bit NC_000913:1-100Some subcommands depend on external executables:
pgr pl ucscrequires the UCSC kent-tools suite, including programs such asfaToTwoBit,axtChain,chainAntiRepeat,chainMergeSort,chainPreNet,chainNet,netSyntenic,netChainSubset,chainStitchId,netSplit,netToAxt,axtSort,axtToMaf,netFilter,netClass, andchainSplit.pgr pl trfdepends ontrfandspanr.pgr pl reptandpgr pl irdepend onFastK,Profex, andspanr.pgr pl p2mdepends onspanr.pgr fas refinedepends on an external multiple sequence alignment tool such asclustalw(default),muscle, ormafft.
about: Third-person singular (e.g., "Counts...", "Calculates...").after_help: Uses raw stringr###"..."###.- Description: Detailed explanation.
- Notes: Bullet points starting with
*.- Standard note for
fa/fas:* Supports both plain text and gzipped (.gz) files - Standard note for
fa/fas:* Reads from stdin if input file is 'stdin' - Standard note for
twobit:* 2bit files are binary and require random access (seeking) - Standard note for
twobit:* Does not support stdin or gzipped inputs
- Standard note for
- Examples: Numbered list (
1.,2.) with code blocks indented by 3 spaces.
- Arguments:
- Input:
infiles(multiple) orinfile(single).- Help:
Input [FASTA|block FA|2bit] file(s) to process.
- Help:
- Output:
outfile(-o,--outfile).- Help:
Output filename. [stdout] for screen.
- Help:
- Input:
- Terminology:
pgr fa-> "FASTA"pgr fas-> "block FA"pgr twobit-> "2bit"
Qiang Wang wang-q@outlook.com
MIT.
Copyright by Qiang Wang.
Written by Qiang Wang wang-q@outlook.com, 2024-