A Snakemake-based pipeline for inferring microbiome co-abundance networks, integrating multiple inference tools in one workflow.
-
Multiple network inference methods:
- Direct Association Methods:
- Correlation-based Methods:
-
Interactive HTML dashboard
-
Flexible input formats (TSV, CSV, BIOM)
-
Comprehensive output with consensus scores
-
Parallel execution support
-
Conda environment management
Output interactive html dashboard:
We provide an example in example directory, where you can learn the input file format and the expected output.
# Clone the repository
git clone https://github.com/SilentGene/netinfer.git
cd netinfer
# Create and activate conda environment
conda env create -f environment.yaml # creates 'netinfer' environment and installs dependencies
conda activate netinfer
# Install FlashWeave by running the provided Julia package
julia workflow/scripts/install_flashweave.jl-
Prepare your data:
- Required: abundance table
abundance_table.tsv - (Optional) taxonomy table
taxonomy.tsvfor additional annotations - (Optional) sample metadata table
metadata.tsvforFlashWeaveand additional annotations
- Required: abundance table
-
Run the pipeline:
# Simple run with all methods enabled and using default settings
netinfer --input abundance_table.tsv --output results_dir --threads 6
# Include taxonomy to allow more information in the output and produce a addictional file with associations between different phyla
netinfer --input abundance_table.tsv --output results_dir --threads 6 --taxonomy taxonomy.tsv
# I don't have a separate taxonomy file, but let's try to infer from feature IDs
netinfer --input abundance_table.tsv --output results_dir --threads 6 --infer-taxonomy
# Only use my favorite methods, and use my own suffix for output files
netinfer --input abundance_table.tsv --output results_dir --threads 6 --methods flashweave,fastspar,spearman --suffix samples007
# Skip visualization
netinfer --input abundance_table.tsv --output results_dir --threads 6 --no-visual
# I'm an expert: specify every detail with my own config file
netinfer --input abundance_table.tsv --output results_dir --threads 6 --config my_config.yamlThe pipeline generates:
- Filtered and processed input data (
results/preprocessed_data/) - Individual network files for each method (
results/subtool_outputs/) - Network statistics with modularity detection (
results/final_results/network_stats.txt) - Merged network edges with consensus scores and community assignment (
results/final_results/merged_edges.tsv) - Merged network nodes with Zi-Pi analysis (
results/final_results/merged_nodes.tsv) - Interactive HTML dashboard (
results/final_results/netinfer_dashboard.html)
No matter what column names are, the first column will always be treated as Feature IDs, and the rest columns as Sample IDs.
- Format: TSV/CSV/BIOM
- Rows: Features (OTUs/ASVs)
- Columns: Samples
- Values: Raw counts or relative abundances (Raw counts are recommended)
Example:
Feature Sample1 Sample2 Sample3
OTU1 100 150 80
OTU2 50 60 40
...
No matter what column names are, the first column will always be treated as Feature IDs, and the second column as Taxonomy strings.
- Format: TSV/CSV
- Required columns:
- Feature ID (matching abundance table)
- Taxonomy string
Example:
Feature Taxonomy
OTU1 d__Bacteria;p__Firmicutes;c__Clostridia
OTU2 d__Bacteria;p__Bacteroidetes;c__Bacteroidia
...
- Format: TSV/CSV
- Rows: Samples (matching abundance table)
- Columns: Metadata variables
See netinfer/config/config.yaml for all available and default parameters and their descriptions.
Default:
Trusted methods
Associations must be present in at least one method from these trusted methods to be retained in the final aggregated table.
| Method | Default Parameters |
|---|---|
| FlashWeave (HE / normal) | P-value ≤ 0.001 Weight ≥ 0.4 |
| SPIEC-EASI | Weight ≥ 0.5 Method: MB (default) or GLasso |
Other methods:
| Method | Default Parameters |
|---|---|
| FastSpar | P-value ≤ 0.05 Iterations: 1000 Correlation ≥ 0.2 Absolute correlation ≥ 0.3 |
| propR | Correlation ≥ 0.5 |
| Spearman | FDR ≤ 0.05 Correlation ≥ 0.7 |
| Pearson | FDR ≤ 0.05 Correlation ≥ 0.7 CLR transformation implemented |
| Jaccard | Similarity ≥ 0.3 |
In most cases, no.
- FlashWeave, SPIEC-EASI, FastSpar, and propR are designed specifically for compositional data. They incorporate appropriate handling internally (e.g., CLR (Centered Log-Ratio) transformation or other compositional-data methods), so you generally do not need to apply an additional transformation yourself.
- Spearman correlation is rank-based and is relatively insensitive to compositionality.
- Jaccard is based on presence/absence and estimates co-occurrence probabilities without relying on abundance values, so no compositional transformation is required.
- Pearson correlation assumes absolute values and can produce spurious correlations on compositional data. Therefore, in the NetInfer pipeline, we apply an internal CLR transformation when computing Pearson correlations.
If you see an error like:
LockException:
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files...
it usually means a previous Snakemake run was interrupted and left a lock behind. You can safely unlock and continue.
netinfer <original_args> --snake_args="--unlock"$ netinfer --help
usage: netinfer [-h] [--input INPUT] [--output OUTPUT] [--taxonomy TAXONOMY]
[--infer-taxonomy] [--metadata METADATA] [--methods METHODS]
[--config CONFIG] [--threads THREADS] [--no-visual] [--suffix SUFFIX]
[--snake-args SNAKE_ARGS]
NetInfer: Microbiome Network Inference Pipeline
options:
-h, --help show this help message and exit
--input INPUT Input abundance table file (TSV/CSV/BIOM format). Overrides config file if specified.
--output OUTPUT Output directory for results. Overrides config file if specified.
--taxonomy TAXONOMY Taxonomy mapping file (optional)
--infer-taxonomy Infer taxonomy from feature IDs in the abundance table (looks for 'd__' or 'p__')
--metadata METADATA Sample metadata file (optional)
--methods METHODS Comma-separated list of methods to use (default: all). Available methods: flashweave, flashweaveHE, fastspar,
pearson, spearman, spieceasi, propr, jaccard
--config CONFIG Path to a base config YAML file. CLI arguments will override settings in this file. The final merged config will be
saved to the output directory.
--threads THREADS Number of threads to use (default: 1)
--no-visual Skip visualization generation
--suffix SUFFIX Suffix to append to output files (default: none)
--snake-args, --snake_args SNAKE_ARGS
Additional Snakemake command-line arguments as a single string, e.g.
--snake-args "--unlock --rerun-incomplete --dry-run"As Snakemake supports resuming from interrupted runs, you can simply re-run the same command to continue from where it left off.
Contributions are welcome! Please feel free to submit a Pull Request.
NetInfer has not been published in a peer-reviewed journal yet. If you use NetInfer in your research, please cite:
@software{netinfer,
author = {Heyu Lin},
title = {NetInfer: A Snakemake Pipeline for Microbiome Network Inference},
year = {2025},
url = {https://github.com/SilentGene/NetInfer}
}- Metadata support. If a metadata file is provided with sample groups specified, the pipeline will generate an additional boxplot for taxon abundance in each group.
- Radar plot for abundance profiles.
- PCA plot for samples according to taxon abundance.
- Allow users to specify trusted methods and other methods to use by CLI arguments (e.g.
--trust flashweave,flashweaveHE,spieceasi[default]), other methods specified by--methodswill be used as other methods. For now, this can be achieved by editing the config file.
Heyu Lin - heyu.lin🌀qut.edu.au
...🧙♂️🧬
