Skip to content

SilentGene/netinfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NetInfer Logo

NetInfer

License: MIT Snakemake Conda

A Snakemake-based pipeline for inferring microbiome co-abundance networks, integrating multiple inference tools in one workflow.

Features

Output interactive html dashboard:

image

We provide an example in example directory, where you can learn the input file format and the expected output.

Quick Start

Prerequisites

Installation

# Clone the repository
git clone https://github.com/SilentGene/netinfer.git
cd netinfer

# Create and activate conda environment
conda env create -f environment.yaml  # creates 'netinfer' environment and installs dependencies
conda activate netinfer

# Install FlashWeave by running the provided Julia package
julia workflow/scripts/install_flashweave.jl

Basic Usage

  1. Prepare your data:

    • Required: abundance table abundance_table.tsv
    • (Optional) taxonomy table taxonomy.tsv for additional annotations
    • (Optional) sample metadata table metadata.tsv for FlashWeave and additional annotations
  2. Run the pipeline:

# Simple run with all methods enabled and using default settings
netinfer --input abundance_table.tsv --output results_dir --threads 6

# Include taxonomy to allow more information in the output and produce a addictional file with associations between different phyla
netinfer --input abundance_table.tsv --output results_dir --threads 6 --taxonomy taxonomy.tsv

# I don't have a separate taxonomy file, but let's try to infer from feature IDs
netinfer --input abundance_table.tsv --output results_dir --threads 6 --infer-taxonomy

# Only use my favorite methods, and use my own suffix for output files
netinfer --input abundance_table.tsv --output results_dir --threads 6 --methods flashweave,fastspar,spearman --suffix samples007

# Skip visualization
netinfer --input abundance_table.tsv --output results_dir --threads 6 --no-visual

# I'm an expert: specify every detail with my own config file
netinfer --input abundance_table.tsv --output results_dir --threads 6 --config my_config.yaml

Output

The pipeline generates:

  1. Filtered and processed input data (results/preprocessed_data/)
  2. Individual network files for each method (results/subtool_outputs/)
  3. Network statistics with modularity detection (results/final_results/network_stats.txt)
  4. Merged network edges with consensus scores and community assignment (results/final_results/merged_edges.tsv)
  5. Merged network nodes with Zi-Pi analysis (results/final_results/merged_nodes.tsv)
  6. Interactive HTML dashboard (results/final_results/netinfer_dashboard.html)

Input File Formats

Abundance Table

No matter what column names are, the first column will always be treated as Feature IDs, and the rest columns as Sample IDs.

  • Format: TSV/CSV/BIOM
  • Rows: Features (OTUs/ASVs)
  • Columns: Samples
  • Values: Raw counts or relative abundances (Raw counts are recommended)

Example:

Feature        Sample1  Sample2  Sample3
OTU1           100      150      80
OTU2           50       60       40
...

Taxonomy Table (Optional)

No matter what column names are, the first column will always be treated as Feature IDs, and the second column as Taxonomy strings.

  • Format: TSV/CSV
  • Required columns:
    • Feature ID (matching abundance table)
    • Taxonomy string

Example:

Feature  Taxonomy
OTU1     d__Bacteria;p__Firmicutes;c__Clostridia
OTU2     d__Bacteria;p__Bacteroidetes;c__Bacteroidia
...

Metadata Table (Optional)

  • Format: TSV/CSV
  • Rows: Samples (matching abundance table)
  • Columns: Metadata variables

Configuration

See netinfer/config/config.yaml for all available and default parameters and their descriptions.

Method Configurations

Default:

Trusted methods

Associations must be present in at least one method from these trusted methods to be retained in the final aggregated table.

Method Default Parameters
FlashWeave (HE / normal) P-value ≤ 0.001
Weight ≥ 0.4
SPIEC-EASI Weight ≥ 0.5
Method: MB (default) or GLasso

Other methods:

Method Default Parameters
FastSpar P-value ≤ 0.05
Iterations: 1000
Correlation ≥ 0.2
Absolute correlation ≥ 0.3
propR Correlation ≥ 0.5
Spearman FDR ≤ 0.05
Correlation ≥ 0.7
Pearson FDR ≤ 0.05
Correlation ≥ 0.7
CLR transformation implemented
Jaccard Similarity ≥ 0.3

Do I need to transform compositional data?

In most cases, no.

  • FlashWeave, SPIEC-EASI, FastSpar, and propR are designed specifically for compositional data. They incorporate appropriate handling internally (e.g., CLR (Centered Log-Ratio) transformation or other compositional-data methods), so you generally do not need to apply an additional transformation yourself.
  • Spearman correlation is rank-based and is relatively insensitive to compositionality.
  • Jaccard is based on presence/absence and estimates co-occurrence probabilities without relying on abundance values, so no compositional transformation is required.
  • Pearson correlation assumes absolute values and can produce spurious correlations on compositional data. Therefore, in the NetInfer pipeline, we apply an internal CLR transformation when computing Pearson correlations.

Troubleshooting

If you see an error like:

LockException:
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files...

it usually means a previous Snakemake run was interrupted and left a lock behind. You can safely unlock and continue.

netinfer <original_args> --snake_args="--unlock"

ALL flags

$ netinfer --help
usage: netinfer [-h] [--input INPUT] [--output OUTPUT] [--taxonomy TAXONOMY] 
                [--infer-taxonomy] [--metadata METADATA] [--methods METHODS] 
                [--config CONFIG] [--threads THREADS] [--no-visual] [--suffix SUFFIX]
                [--snake-args SNAKE_ARGS]

NetInfer: Microbiome Network Inference Pipeline

options:
  -h, --help            show this help message and exit
  --input INPUT         Input abundance table file (TSV/CSV/BIOM format). Overrides config file if specified.
  --output OUTPUT       Output directory for results. Overrides config file if specified.
  --taxonomy TAXONOMY   Taxonomy mapping file (optional)
  --infer-taxonomy      Infer taxonomy from feature IDs in the abundance table (looks for 'd__' or 'p__')
  --metadata METADATA   Sample metadata file (optional)
  --methods METHODS     Comma-separated list of methods to use (default: all). Available methods: flashweave, flashweaveHE, fastspar,
                        pearson, spearman, spieceasi, propr, jaccard
  --config CONFIG       Path to a base config YAML file. CLI arguments will override settings in this file. The final merged config will be
                        saved to the output directory.
  --threads THREADS     Number of threads to use (default: 1)
  --no-visual           Skip visualization generation
  --suffix SUFFIX       Suffix to append to output files (default: none)
  --snake-args, --snake_args SNAKE_ARGS
                        Additional Snakemake command-line arguments as a single string, e.g. 
                        --snake-args "--unlock --rerun-incomplete --dry-run"

Continue from interrupted runs

As Snakemake supports resuming from interrupted runs, you can simply re-run the same command to continue from where it left off.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

NetInfer has not been published in a peer-reviewed journal yet. If you use NetInfer in your research, please cite:

@software{netinfer,
  author = {Heyu Lin},
  title = {NetInfer: A Snakemake Pipeline for Microbiome Network Inference},
  year = {2025},
  url = {https://github.com/SilentGene/NetInfer}
}

Future plans

  • Metadata support. If a metadata file is provided with sample groups specified, the pipeline will generate an additional boxplot for taxon abundance in each group.
  • Radar plot for abundance profiles.
  • PCA plot for samples according to taxon abundance.
  • Allow users to specify trusted methods and other methods to use by CLI arguments (e.g. --trust flashweave,flashweaveHE,spieceasi [default]), other methods specified by --methods will be used as other methods. For now, this can be achieved by editing the config file.

Contact

Heyu Lin - heyu.lin🌀qut.edu.au

...🧙‍♂️🧬

About

A Snakemake-based bioinformatics pipeline for inferring co-abundance networks from microbiome data.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors