Skip to content

Itsbosire/Phylo_Genomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Comparative Genomics and Phylogenetic Analysis of Puccinia graminis f.sp tritici in Wheat

Understanding the Genetic Basis of Pathogenicity

Project Report can be found here: Project Report

Presentation can be found here: Presentation

🧬 Overview

This project explores the genetic diversity and evolutionary relationships of Puccinia graminis f.sp. tritici β€” the fungal pathogen responsible for wheat stem rust. Through comparative genomics and phylogenetic inference, the study aims to shed light on genomic factors contributing to its pathogenicity and adaptation. This work is pivotal in enhancing our understanding of disease evolution and could support future strategies in disease management and resistance breeding in wheat.


Prerequisites

  • Linux Operating system
  • Bash shell
  • Tools installed :
    • Busco
    • QUAST
    • RagTag
    • RepeatMasker
    • Augustus
    • EggNOG-mapper
    • OrthoFinder
    • MAFFT
    • FastTree
    • MUMmer
    • Snippy

Installation

To set up the environment and install the necessary tools, follow these steps:

Step 1: Install Conda or Mamba

If Conda or Mamba is not already installed on your system, follow the instructions below:

  • Install Conda: Visit Miniconda or Anaconda to download and install Conda.

  • Install Mamba (optional but faster alternative to Conda): After installing Conda, you can install Mamba using:

  conda install -n base -c conda-forge mamba

Step 2: Create a Conda Environment

Create a new environment for the pipeline and install the required tools:

conda create -n NGS-pipeline -c bioconda busco quast ragtag repeatmasker augustus eggnog-mapper orthofinder mafft fasttree mummer snippy

Step 3: Activate the Environment

Activate the newly created environment:

conda activate NGS-pipeline

Step 4: Verify Installation

Ensure all tools are installed and accessible:

busco --version
quast --version
ragtag.py --version
Repeatmasker --version
augustus --version
emapper --version
orthofinder --version
mafft --version
FastTree --version
mummer --version
snippy --version

Step 5: Run the Pipeline

Follow the methodology outlined in the Methodology section to execute the pipeline scripts.

🎯 Objectives

  • To identify the genetic diversity among different strains of P. graminis f. sp. tritici.
  • To infer evolutionary relationships between these strains using phylogenetic analysis.

πŸ§ͺ Methodology

1. Data Acquisition

2. Quality Assessment

  • BUSCO: Evaluated genome completeness.
  • QUAST: Provided assembly statistics and quality reports.

3. Genome Processing

  • RagTag: Scaffolded draft genomes using reference-based alignment.
  • RepeatMasker: Identified and masked repetitive elements.
  • Augustus: Predicted genes within the masked genome assemblies.

4. Functional Annotation

  • EggNOG-mapper: Annotated predicted genes based on orthologous group assignment and functional domains.

5. Phylogenetic Analysis

  • OrthoFinder: Identified orthologous gene clusters and single-copy orthologs.
  • MAFFT: Performed multiple sequence alignment of single-copy orthologs.
  • FastTree: Generated a phylogenetic tree from the aligned sequences.
  • iTOL: Visualized and interpreted the resulting phylogenetic tree.

6. Comparative Genomics

  • MUMmer: Conducted whole-genome alignment across strains to detect large-scale structural variations.
  • Snippy: Performed variant calling to identify SNPs and INDELs.

Methodology



πŸ“Š Results

Quality Control

Busco Results


Phylogenetic Analysis

Phylogenetic Tree

Evoution

Genome Clustering and Ortholog Analysis

Clustering

Venn

cluster count

Pairwise

Comparative Genomics
Whole-Genome Alignment

Alignment

SNPs and INDELs Distribution

Snippy Results

--

πŸ“š References

  1. BUSCO: Benchmarking Universal Single-Copy Orthologs. Official Documentation.
  2. QUAST: Quality Assessment Tool for Genome Assemblies. Official Documentation.
  3. RagTag: Reference-guided scaffolding tool. GitHub Repository.
  4. RepeatMasker: A tool for identifying and masking repetitive elements in genomic sequences. Official Website.
  5. Augustus: Gene prediction tool. Official Website.
  6. EggNOG-mapper: Functional annotation tool based on orthologous groups. Official Website.
  7. OrthoFinder: Ortholog identification tool. Official Documentation.
  8. MAFFT: Multiple sequence alignment tool. Official Website.
  9. FastTree: A tool for constructing phylogenetic trees. Official Website.
  10. MUMmer: Whole-genome alignment tool. Official Website.
  11. Snippy: Rapid variant calling and core genome alignment tool. GitHub Repository.
  12. NCBI Genome Database: Source for genome sequences. NCBI Website.
  13. iTOL: Interactive Tree of Life for phylogenetic tree visualization. Official Website.

πŸ“ Project Structure

Genomics/
β”œβ”€β”€ data/                  # Raw genome files
β”œβ”€β”€ code/                  # Bash scripts used in the pipeline
β”‚   β”œβ”€β”€ data.sh              # Download genomes from NCBI
β”‚   β”œβ”€β”€ busco.sh                  # Run BUSCO quality assessment
β”‚   β”œβ”€β”€ quast.sh                  # Run QUAST for assembly stats
β”‚   β”œβ”€β”€ ragtag.sh       # Perform genome scaffolding
β”‚   β”œβ”€β”€ repeatmasker.sh           # Execute RepeatMasker
β”‚   β”œβ”€β”€ augustus.sh     # Run gene prediction using Augustus
β”‚   β”œβ”€β”€ eggnog.sh       # Functional annotation with EggNOG-mapper
β”‚   β”œβ”€β”€ orthofinder.sh            # Ortholog identification
β”‚   β”œβ”€β”€ mafft.sh                  # Multiple sequence alignment
β”‚   β”œβ”€β”€ fasttree.sh        # Construct phylogenetic tree
β”‚   β”œβ”€β”€ compare_genomes.sh                 # Whole-genome alignment with MUMmer
β”‚   └── run_snippy.sh                 # Variant calling using Snippy
β”œβ”€β”€ qc_reports/            # BUSCO and QUAST outputs
β”œβ”€β”€ scaffolds/             # RagTag scaffolded assemblies
β”œβ”€β”€ masked_genomes/        # Masked genome assemblies
β”œβ”€β”€ repeatmasker/          # RepeatMasker output files
β”œβ”€β”€ gene_predictions/      # Augustus GFFs and FASTAs
β”œβ”€β”€ annotations/           # EggNOG-mapper outputs
β”œβ”€β”€ orthofinder_results/   # Orthologous gene clusters
β”œβ”€β”€ alignments/            # MAFFT alignments
β”œβ”€β”€ phylogeny/             # FastTree and iTOL trees
β”œβ”€β”€ genome_alignment/      # MUMmer outputs
β”œβ”€β”€ variants/              # Snippy results (SNPs, INDELs)
└── README.md              # This file

About

This repository provides a workflow of performing Phylogenetic analysis and comparative genomics of P.graminis f.sp a fungal pathogen causing stem rust disease in wheat in the quest of understanding it's pathogenicity and adaptation. Specifically focuses on Puccinia graminis f .sp tritici, a causative agent of stem rust disease in wheat.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors