Skip to content

anaapspereira/PyHIV

Repository files navigation

🧬 PyHIV: A Python Package for Local HIV‑1 Sequence Alignment, Subtyping and Gene Splitting

CI codecov Python Version OS Supported

PyPI version Documentation Status License: MIT GitHub issues


πŸ“‹ Table of Contents


πŸ“– Overview

PyHIV is a Python tool that aligns HIV nucleotide sequences against reference genomes to determine the most similar subtype and optionally split the aligned sequences into gene regions.

It produces:

  • Best reference alignment per sequence
  • Subtype and reference metadata
  • Gene-region–specific FASTA files (optional)
  • A final summary table (final_table.tsv)

βš™οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User FASTA sequences                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
       Read and preprocess input
                β”‚
                β–Ό
 Align sequences against reference genomes
                β”‚
                β–Ό
    Identify best matching reference
                β”‚
                β–Ό
     (Optional) Split by gene region
                β”‚
                β–Ό
  Save results and summary table (.tsv)

πŸ“¦ Installation

From PyPI (Recommended)

pip install pyhiv-tools

From Source

git clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
pip install -e .

Development Installation

git clone https://github.com/anaapspereira/PyHIV.git
cd PyHIV
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • pandas
  • biopython
  • pyfamsa
  • click
  • matplotlib

πŸš€ Getting Started

Quick Start (CLI)

The easiest way to use PyHIV is through the command line:

# Install PyHIV
pip install pyhiv-tools

# Run analysis on your sequences
pyhiv run /path/to/your/fasta/files

# Check results
ls PyHIV_results/

Python API Usage

from pyhiv import PyHIV

PyHIV(
    fastas_dir="path/to/fasta/files",
    subtyping=True,
    splitting=True,
    output_dir="results_folder",
    n_jobs=4,
    reporting=True
)

Parameters:

Parameter Type Default Description
fastas_dir str Required Directory containing user FASTA files.
subtyping bool True Aligns against subtype reference genomes. If False, aligns only to HXB2.
splitting bool True Splits aligned sequences into gene regions.
output_dir str "PyHIV_results" Output directory for results.
n_jobs int None Number of parallel jobs for alignment.
reporting bool True Generates PDF report with sequence visualizations.

πŸ“‚ Output Structure

After running PyHIV, your output directory (default: PyHIV_results/) will contain:

PyHIV_results/
β”‚
β”œβ”€β”€ best_alignment_<sequence>.fasta     # Alignment to best reference
β”œβ”€β”€ final_table.tsv                     # Summary of results
β”œβ”€β”€ PyHIV_report_all_sequences.pdf     # PDF report (if reporting=True)
β”‚
β”œβ”€β”€ gag/
β”‚   β”œβ”€β”€ <sequence>_gag.fasta
β”‚   └── ...
β”œβ”€β”€ pol/
β”‚   β”œβ”€β”€ <sequence>_pol.fasta
β”‚   └── ...
└── env/
    β”œβ”€β”€ <sequence>_env.fasta
    └── ...

Final Table Columns:

Column Description
Sequence Input sequence name
Reference Best matching reference accession
Subtype Predicted HIV-1 subtype
Most Matching Gene Region Region with highest similarity
Present Gene Regions All detected gene regions with valid alignments

πŸ“Ÿ Command Line Interface

PyHIV provides a comprehensive command-line interface for HIV-1 sequence analysis.

πŸš€ Basic Commands

# Basic usage - process all FASTA files in a directory
pyhiv run sequences/

# With custom output directory
pyhiv run sequences/ -o my_results/

# Parallel processing with 8 jobs
pyhiv run sequences/ -j 8

# Validate input files before processing
pyhiv validate sequences/

βš™οΈ Main Options

Option Description
--subtyping / --no-subtyping Enable/disable HIV-1 subtyping (default: enabled)
--splitting / --no-splitting Enable/disable gene region splitting (default: enabled)
-o, --output-dir PATH Output directory (default: PyHIV_results)
-j, --n-jobs INTEGER Number of parallel jobs (default: all CPUs)
-v, --verbose Detailed output
-q, --quiet Suppress non-error output

πŸ’Ό Common Use Cases

Full analysis with subtyping and splitting:

pyhiv run data/sequences/

Alignment only (no subtyping or splitting):

pyhiv run data/sequences/ --no-subtyping --no-splitting

Subtyping without gene splitting:

pyhiv run data/sequences/ --no-splitting

Parallel processing for large datasets:

pyhiv run data/sequences/ -j 8 -o results/batch1/

Validation before processing:

pyhiv validate data/sequences/

πŸ†˜ Getting Help

pyhiv --help           # Show all commands
pyhiv run --help       # Show options for run command
pyhiv validate --help # Show validation options
pyhiv --version        # Show version

For comprehensive CLI documentation, see CLI_README.md.


πŸ—‚οΈ Citation

If you use PyHIV in your research, please cite:

@software{pyhiv2024,
  title={PyHIV: A Python Package for Local HIV-1 Sequence Alignment, Subtyping and Gene Splitting},
  author={Santos-Pereira, Ana},
  year={2024},
  url={https://github.com/anaapspereira/PyHIV},
  license={MIT}
}

Note: Manuscript in preparation. Please cite this repository if you use PyHIV in your research.


🀝 Contributing

Reporting Issues

Please report bugs and request features through GitHub Issues.


πŸ“š Documentation


🧾 License

This project is licensed under the MIT License β€” see the LICENSE file for details.


πŸ’° Funding

  • AIV-Tropism project funded by 2CA-Braga / ICVS.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages