- Overview
- How It Works
- Installation
- Getting Started
- Command Line Interface
- Output Structure
- Citation
- License
- Funding
PyHIV is a Python tool that aligns HIV nucleotide sequences against reference genomes to determine the most similar subtype and optionally split the aligned sequences into gene regions.
It produces:
- Best reference alignment per sequence
- Subtype and reference metadata
- Gene-regionβspecific FASTA files (optional)
- A final summary table (
final_table.tsv)
βββββββββββββββββββββββββββββββββββββββββββββββ
β User FASTA sequences β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Read and preprocess input
β
βΌ
Align sequences against reference genomes
β
βΌ
Identify best matching reference
β
βΌ
(Optional) Split by gene region
β
βΌ
Save results and summary table (.tsv)
pip install pyhiv-toolsgit clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
pip install -e .git clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
pip install -e ".[dev]"- Python 3.10+
- pandas
- biopython
- pyfamsa
- click
- matplotlib
The easiest way to use PyHIV is through the command line:
# Install PyHIV
pip install pyhiv-tools
# Run analysis on your sequences
pyhiv run /path/to/your/fasta/files
# Check results
ls PyHIV_results/from pyhiv import PyHIV
PyHIV(
fastas_dir="path/to/fasta/files",
subtyping=True,
splitting=True,
output_dir="results_folder",
n_jobs=4,
reporting=True
)| Parameter | Type | Default | Description |
|---|---|---|---|
fastas_dir |
str |
Required | Directory containing user FASTA files. |
subtyping |
bool |
True |
Aligns against subtype reference genomes. If False, aligns only to HXB2. |
splitting |
bool |
True |
Splits aligned sequences into gene regions. |
output_dir |
str |
"PyHIV_results" |
Output directory for results. |
n_jobs |
int |
None |
Number of parallel jobs for alignment. |
reporting |
bool |
True |
Generates PDF report with sequence visualizations. |
After running PyHIV, your output directory (default: PyHIV_results/) will contain:
PyHIV_results/
β
βββ best_alignment_<sequence>.fasta # Alignment to best reference
βββ final_table.tsv # Summary of results
βββ PyHIV_report_all_sequences.pdf # PDF report (if reporting=True)
β
βββ gag/
β βββ <sequence>_gag.fasta
β βββ ...
βββ pol/
β βββ <sequence>_pol.fasta
β βββ ...
βββ env/
βββ <sequence>_env.fasta
βββ ...
| Column | Description |
|---|---|
| Sequence | Input sequence name |
| Reference | Best matching reference accession |
| Subtype | Predicted HIV-1 subtype |
| Most Matching Gene Region | Region with highest similarity |
| Present Gene Regions | All detected gene regions with valid alignments |
PyHIV provides a comprehensive command-line interface for HIV-1 sequence analysis.
# Basic usage - process all FASTA files in a directory
pyhiv run sequences/
# With custom output directory
pyhiv run sequences/ -o my_results/
# Parallel processing with 8 jobs
pyhiv run sequences/ -j 8
# Validate input files before processing
pyhiv validate sequences/| Option | Description |
|---|---|
--subtyping / --no-subtyping |
Enable/disable HIV-1 subtyping (default: enabled) |
--splitting / --no-splitting |
Enable/disable gene region splitting (default: enabled) |
-o, --output-dir PATH |
Output directory (default: PyHIV_results) |
-j, --n-jobs INTEGER |
Number of parallel jobs (default: all CPUs) |
-v, --verbose |
Detailed output |
-q, --quiet |
Suppress non-error output |
Full analysis with subtyping and splitting:
pyhiv run data/sequences/Alignment only (no subtyping or splitting):
pyhiv run data/sequences/ --no-subtyping --no-splittingSubtyping without gene splitting:
pyhiv run data/sequences/ --no-splittingParallel processing for large datasets:
pyhiv run data/sequences/ -j 8 -o results/batch1/Validation before processing:
pyhiv validate data/sequences/pyhiv --help # Show all commands
pyhiv run --help # Show options for run command
pyhiv validate --help # Show validation options
pyhiv --version # Show versionFor comprehensive CLI documentation, see CLI_README.md.
If you use PyHIV in your research, please cite:
@software{pyhiv2024,
title={PyHIV: A Python Package for Local HIV-1 Sequence Alignment, Subtyping and Gene Splitting},
author={Santos-Pereira, Ana},
year={2024},
url={https://github.com/anaapspereira/PyHIV},
license={MIT}
}Note: Manuscript in preparation. Please cite this repository if you use PyHIV in your research.
Please report bugs and request features through GitHub Issues.
- Full Documentation: https://pyhiv.readthedocs.io/
- CLI Reference: CLI_README.md
- API Reference: Available in the documentation
This project is licensed under the MIT License β see the LICENSE file for details.
- AIV-Tropism project funded by 2CA-Braga / ICVS.