MethanoHunt

A pipeline to profile methane cyclers from taxonomic profiling data or functional marker genes.

Overview

MethanoHunt provides four handy workflows:

Profile: Summarizes relative abundance of methane cyclers from taxonomic profiles (e.g. singleM).
Taxonomy: Find methane cyclers according to GTDB taxonomy.
Gene: A pipeline to detect, classify, and quantify methane cycling marker genes (McrA, PmoA, MmoX) from protein sequences.
Genome: A pipeline to detect and classify methane cyclers from genomes/metagenome-assembled genomes.

Installation

Dependencies

MethanoHunt requires:

Python 3.8+
Snakemake > 9.0
HMMER3
raxml-ng, epa-ng, gappa
Minimap2, Samtools
MicrobeCensus
Python packages: pandas, natsort, plotly, click, pysam
PaPaRa (install independently)

Use conda to install all depencies except PaPaRa which needs another step.

git clone https://github.com/SilentGene/MethanoHunt.git
conda env create -f methanohunt.yaml
conda activate methanohunt
methanohunt setup  # to install PaPaRa

Profile Workflow

Analyze taxonomic abundance tables.

methanohunt profile -i singleM_results/*.tax.tsv -o methanohunt_results [-db taxonomy_db.tsv --group sample_group.tsv]

*   `-i`: Input taxonomy tables (supports glob patterns).
*   `-o`: Prefix for Output files. It will generate a TSV result and an HTML report.
*   `-db`: (Optional) Custom database path. If not provided, it will use the default database installed along with the pipeline.
*   `--group/-g`: (Optional) User can provide a tsv file to group samples. Then the pipeline will generate a grouped visualization. The tsv file should have the following format:

    sample1	group1
    sample2	group1
    sample3	group2
    sample4	group2

Database Format

The TSV database file will be downloaded with the script. It contains the following columns:

GTDB_taxonomy: Taxonomic identifier
Subgroup: Metabolic subgroup
Classification: Functional classification
Exception_taxonomy (optional): Comma-separated taxa to exclude

You can customize the database by editing this TSV file.

An example workflow from raw reads to MethanoHunt profile mode results

Suppose you have 10 paired-end metagenomic samples in FASTQ format with filenames like sample1_R1.fastq.gz/sample1_R2.fastq.gz, sample2_R1.fastq.gz/sample2_R2.fastq.gz, and so on.

Step 1: Run singleM to generate taxonomic profiles

You don't need to trim or QC the reads before running singleM

cd /my/raw_reads/  # change to the directory with your FASTQ files
SAMPLES=$(ls *_R1.fastq.gz | sed 's/_R1.fastq.gz//')  # get sample names
conda activate singlem  # activate your singleM conda environment

mkdir -p ../singleM_results  # create singleM output directory

# Run singleM for each sample
for SAMPLE in $SAMPLES; do
    singlem pipe -1 ${SAMPLE}_R1.fastq.gz -2 ${SAMPLE}_R2.fastq.gz --threads 4 \
    --taxonomic-profile ../singleM_results/"$SAMPLE"_singlem.tax.tsv \
    --taxonomic-profile-krona ../singleM_results/"$SAMPLE"_singlem.tax-krona.html \
    --otu-table ../singleM_results/"$SAMPLE"_singlem.otu.tsv
done

Step 2: Run MethanoHunt on the generated taxonomic profiles

methanohunt profile -i ./singleM_results/*_singlem.tax.tsv -o methanohunt_results

Notes

Taxonomy-based classifications may have false positives. Verification with functional gene analysis is recommended.

Taxonomy Workflow

Annotate taxonomic profiles with methane cycler information according to GTDB taxonomy.

Example:

methanohunt taxonomy -i MAG_classification.tsv -c Taxonomy -o methanohunt_annotate_results.tsv

-i: Input tsv file, including a column with taxonomic classification. This table MUST contain header.
-c: Column name of the taxonomic classification in the input tsv file.
-o: Output tsv file.
-db: (Optional) Custom database path. If not provided, it will use the default database installed along with the pipeline.

Output

This workflow will generate a tsv file with the following two additional columns added to the input tsv file:

MethanoHunt_classification: methane cycling role
MethanoHunt_subgroup: methane cycling subgroup

Gene Workflow

Analyze functional genes from protein sequences.

Example:

Classify genes and build phylogenetic trees with reference sequences

methanohunt gene --prot assembly_genes.faa -o my_results --tree

With abundance calculation (requires nucleotide genes and reads):

methanohunt gene \
    --prot assembly_genes.faa \
    --nucl assembly_genes.ffn \
    -1 sample_R1.fq.gz -2 sample_R2.fq.gz \
    -o my_results \
    --marker McrA,PmoA,MmoX \
    --tree \
    --threads 8

--prot: Input protein FASTA (.faa).
--nucl: (Optional) Corresponding gene nucleotide FASTA (.ffn), required for abundance.
-1, -2: (Optional) Reads for abundance quantification. Support multiple files or glob patterns (e.g., *.fq.gz). You can rely on shell expansion (e.g. 2015*_1.fq.gz).
-o: Output directory.
--marker: (Optional) Marker genes to analyze. Default is McrA,PmoA,MmoX.
--tree: (Optional) Build phylogenetic trees together with reference markers using fasttree. Default is False.
-db: Database folder. Default is the database folder included in the package.

Gene Module Output

Important results:

methanohunt_gene_classification.tsv: Detected genes and their functional classification.
MethanoHunt_report.html: Abundance visualization (RPKG based).
RPKG/: RPKG abundance tables (classified, subtype, combined).
classified_sequences/: Detected sequences and their functional classification in fasta format.
tree/: Phylogenetic trees of detected sequences (if --tree is specified).

Other results:

bam/: Mapping results and reference.
microbecensus/: Genome equivalent estimation results.
hmm/, hits/, placement/, classification/, Intermediate results.

Here is an example of the output (MethanoHunt_gene_report.html)[docs/MethanoHunt_gene_report.html].

Screenshot of the interactive chart:

Genes that are able to be classified in this module

Homologue	KO	Classification	Description	Refs	Note
McrA	K00399	Methanogen	Anaerobic methanogenesis	Chadwick et al., (2022)
McrA	K00399	ANME	Anaerobic methane oxidation	Chadwick et al., (2022)
McrA	K00399	ANKA	Anaerobic alkane oxidation	Chadwick et al., (2022)
PmoA/AmoA	K10944	PmoA	Aerobic methane oxidation (particulate)	Leu et al., (2025)
PmoA/AmoA	K10944	AmoA	Aerobic ammonia oxidation	Leu et al., (2025)
PmoA/AmoA	K10944	HMO	Aerobic hydrocarbon oxidation	Leu et al., (2025)
MmoX	K16157	MmoX	Aerobic methane oxidation (soluble)	KEGG K16157
MmoX	K16157	PrmA	Aerobic propane oxidation	KEGG K16157
MmoX	K16157	MimA	Aerobic alkane oxidation	KEGG K16157
PhnJ	K06163	PhnJ	Aerobic methanogenesis	Boden et al., 2024	To be implemented

Genome Workflow

Detect and classify methane cyclers from genomes/metagenome-assembled genomes.

Example:

methanohunt genome --genome_dir test_genomes --suffix fa
            -p db/kofam/profiles
            -k db/kofam/ko_list
            --taxonomy GenomeGTDB_sample.tsv --col GTDB
            --threads 8

--genome_dir: Directory containing the input genomes or protein sequences.
--suffix: Suffix of the input genomes (e.g., fa, fna, faa).
-p: Path to the KOFAM profiles directory.
-k: Path to the KOFAM KO list file.
--taxonomy: Path to the taxonomic classification file.
--col: Column name of the taxonomic classification in the input tsv file.
--threads: Number of threads to use.

Output

Database

The required database files for both taxonomy and gene modules are included in the package.

...🧙‍♂️🧬

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
docs		docs
methanohunt		methanohunt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
methanohunt.yaml		methanohunt.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MethanoHunt

Overview

Installation

Dependencies

Profile Workflow

Database Format

An example workflow from raw reads to MethanoHunt profile mode results

Notes

Taxonomy Workflow

Output

Gene Workflow

Gene Module Output

Genes that are able to be classified in this module

Genome Workflow

Output

Database

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MethanoHunt

Overview

Installation

Dependencies

Profile Workflow

Database Format

An example workflow from raw reads to MethanoHunt profile mode results

Notes

Taxonomy Workflow

Output

Gene Workflow

Gene Module Output

Genes that are able to be classified in this module

Genome Workflow

Output

Database

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages