Project Report can be found here: Project Report
Presentation can be found here: Presentation
This project explores the genetic diversity and evolutionary relationships of Puccinia graminis f.sp. tritici β the fungal pathogen responsible for wheat stem rust. Through comparative genomics and phylogenetic inference, the study aims to shed light on genomic factors contributing to its pathogenicity and adaptation. This work is pivotal in enhancing our understanding of disease evolution and could support future strategies in disease management and resistance breeding in wheat.
- Linux Operating system
- Bash shell
- Tools installed :
- Busco
- QUAST
- RagTag
- RepeatMasker
- Augustus
- EggNOG-mapper
- OrthoFinder
- MAFFT
- FastTree
- MUMmer
- Snippy
To set up the environment and install the necessary tools, follow these steps:
If Conda or Mamba is not already installed on your system, follow the instructions below:
-
Install Conda: Visit Miniconda or Anaconda to download and install Conda.
-
Install Mamba (optional but faster alternative to Conda): After installing Conda, you can install Mamba using:
conda install -n base -c conda-forge mambaCreate a new environment for the pipeline and install the required tools:
conda create -n NGS-pipeline -c bioconda busco quast ragtag repeatmasker augustus eggnog-mapper orthofinder mafft fasttree mummer snippyActivate the newly created environment:
conda activate NGS-pipelineEnsure all tools are installed and accessible:
busco --version
quast --version
ragtag.py --version
Repeatmasker --version
augustus --version
emapper --version
orthofinder --version
mafft --version
FastTree --version
mummer --version
snippy --versionFollow the methodology outlined in the Methodology section to execute the pipeline scripts.
- To identify the genetic diversity among different strains of P. graminis f. sp. tritici.
- To infer evolutionary relationships between these strains using phylogenetic analysis.
- Genome sequences were downloaded from the NCBI Genome Database.
- BUSCO: Evaluated genome completeness.
- QUAST: Provided assembly statistics and quality reports.
- RagTag: Scaffolded draft genomes using reference-based alignment.
- RepeatMasker: Identified and masked repetitive elements.
- Augustus: Predicted genes within the masked genome assemblies.
- EggNOG-mapper: Annotated predicted genes based on orthologous group assignment and functional domains.
- OrthoFinder: Identified orthologous gene clusters and single-copy orthologs.
- MAFFT: Performed multiple sequence alignment of single-copy orthologs.
- FastTree: Generated a phylogenetic tree from the aligned sequences.
- iTOL: Visualized and interpreted the resulting phylogenetic tree.
- MUMmer: Conducted whole-genome alignment across strains to detect large-scale structural variations.
- Snippy: Performed variant calling to identify SNPs and INDELs.
--
- BUSCO: Benchmarking Universal Single-Copy Orthologs. Official Documentation.
- QUAST: Quality Assessment Tool for Genome Assemblies. Official Documentation.
- RagTag: Reference-guided scaffolding tool. GitHub Repository.
- RepeatMasker: A tool for identifying and masking repetitive elements in genomic sequences. Official Website.
- Augustus: Gene prediction tool. Official Website.
- EggNOG-mapper: Functional annotation tool based on orthologous groups. Official Website.
- OrthoFinder: Ortholog identification tool. Official Documentation.
- MAFFT: Multiple sequence alignment tool. Official Website.
- FastTree: A tool for constructing phylogenetic trees. Official Website.
- MUMmer: Whole-genome alignment tool. Official Website.
- Snippy: Rapid variant calling and core genome alignment tool. GitHub Repository.
- NCBI Genome Database: Source for genome sequences. NCBI Website.
- iTOL: Interactive Tree of Life for phylogenetic tree visualization. Official Website.
Genomics/
βββ data/ # Raw genome files
βββ code/ # Bash scripts used in the pipeline
β βββ data.sh # Download genomes from NCBI
β βββ busco.sh # Run BUSCO quality assessment
β βββ quast.sh # Run QUAST for assembly stats
β βββ ragtag.sh # Perform genome scaffolding
β βββ repeatmasker.sh # Execute RepeatMasker
β βββ augustus.sh # Run gene prediction using Augustus
β βββ eggnog.sh # Functional annotation with EggNOG-mapper
β βββ orthofinder.sh # Ortholog identification
β βββ mafft.sh # Multiple sequence alignment
β βββ fasttree.sh # Construct phylogenetic tree
β βββ compare_genomes.sh # Whole-genome alignment with MUMmer
β βββ run_snippy.sh # Variant calling using Snippy
βββ qc_reports/ # BUSCO and QUAST outputs
βββ scaffolds/ # RagTag scaffolded assemblies
βββ masked_genomes/ # Masked genome assemblies
βββ repeatmasker/ # RepeatMasker output files
βββ gene_predictions/ # Augustus GFFs and FASTAs
βββ annotations/ # EggNOG-mapper outputs
βββ orthofinder_results/ # Orthologous gene clusters
βββ alignments/ # MAFFT alignments
βββ phylogeny/ # FastTree and iTOL trees
βββ genome_alignment/ # MUMmer outputs
βββ variants/ # Snippy results (SNPs, INDELs)
βββ README.md # This file








