Skip to content
This repository was archived by the owner on Jan 13, 2026. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,9 @@ poetry.lock
**/.nextflow
**/*.log*

# Metaflow metadata
**/.metaflow

# notebooks
**/*.ipynb

Expand Down
94 changes: 83 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,103 @@

## Setup

Swap the path in `envs/env.yaml` to the project root to your cloned path:
### Environment Setup

Install the project and workflow environment:

```bash
# project environment with Metaflow
conda env create --name <name> --file envs/env.yaml
```

Update the path in `envs/env.yaml` to point to your cloned repository path:

```yaml
...
- pip:
# swap me for git URL
# swap me for git URL or local path
- /path/to/cloned/repo
...
```

Install the nextflow and project environments:
### Configuration

If running on Gemini, ensure that Alphafold MSAs will use your scratch as a tmp directory (some MSA intermediate files are larger than `/tmp` on compute nodes).

Configure Metaflow if needed:

```bash
# project environment
conda env create --name <name> --file envs/env.yaml
# nf-core/nextflow env
conda env create --name <name> --file envs/nf-core.yaml
# Optional: configure Metaflow datastore location
export METAFLOW_DATASTORE_SYSROOT_LOCAL=/path/to/datastore
```

If running on Gemini, ensure that Alphafold MSAs will use your scratch as a tmp directory (some MSA intermediate files are larger than `/tmp` on compute nodes)
## Workflows

This project uses **Metaflow** for workflow orchestration. The workflows have been migrated from the previous Nextflow implementation.

### Running Workflows

#### Command Line Usage

```bash
vim ~/.nextflow/config
cd workflows
python linear_epitope_flow.py run --dset-name <dataset> --workflow-type <type>
```

Where `<dataset>` can be:
- `bp3c50id`
- `hv_class`
- `hv_seg`
- `iedb_bp3`
- `in_class`
- `in_seg`

And `<type>` can be:
- `clean`: Data cleaning and preparation
- `msa_focal`: MSA generation for focal proteins
- `msa_peptide`: MSA generation for peptides
- `inference_focal`: AlphaFold3 inference for focal proteins
- `inference_peptide`: AlphaFold3 inference for peptides
- `bepipred`: BepiPred scoring
- `extract_conf`: Confidence extraction

#### SLURM Job Submission

Use the provided shell scripts to submit jobs to SLURM:

```bash
params.msa_tmpdir = "/scratch/<username>/tmp"
```
# Clean raw data
sbatch scripts_metaflow/bp3c50id/00_run_clean_raw_data.sh

# Run MSA for focal proteins
sbatch scripts_metaflow/bp3c50id/02_run_msa_focal_protein.sh

# Run inference
sbatch scripts_metaflow/bp3c50id/03_run_inference_focal_protein.sh

# Run BepiPred
sbatch scripts_metaflow/bp3c50id/04_run_bepipred.sh

# Extract confidence
sbatch scripts_metaflow/bp3c50id/06_extract_conf.sh
```

#### Pipeline Execution Order

For a complete analysis, run workflows in this order:

1. **Clean raw data** (`clean`)
2. **Generate MSA** (`msa_focal` or `msa_peptide`)
3. **Run inference** (`inference_focal` or `inference_peptide`)
4. **Calculate BepiPred scores** (`bepipred`)
5. **Extract confidence** (`extract_conf`)

### Legacy Nextflow Files

The old Nextflow workflows have been replaced with Metaflow. For reference, the original files are still present in the `workflows/` directory but are no longer used:

- `*.nf` files - Original Nextflow workflow definitions
- `scripts/` - Original Nextflow execution scripts
- `nextflow.config` - Nextflow configuration

For current usage, use the Metaflow workflows described above.
8 changes: 8 additions & 0 deletions envs/DEPRECATED_nf-core.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# DEPRECATED: nf-core environment

This file is deprecated and no longer needed.

The Nextflow/nf-core environment has been replaced with Metaflow.
All dependencies are now included in the main `env.yaml` file.

Use `env.yaml` instead of this file.
4 changes: 4 additions & 0 deletions envs/env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,11 @@ dependencies:
# viz
- py3Dmol

# workflow orchestration
- pip

- pip:
# swap me for git URL
- -e /tgen_labs/altin/alphafold3/runs/linear_peptide
- git+https://github.com/ljwoods2/mdaf3.git@main
- metaflow
File renamed without changes.
22 changes: 22 additions & 0 deletions scripts/DEPRECATED_NEXTFLOW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# DEPRECATED: Nextflow Execution Scripts

**⚠️ NOTICE: These Nextflow execution scripts have been replaced with Metaflow scripts.**

This directory contains the original shell scripts that launched Nextflow workflows. They are no longer actively used and have been replaced with Metaflow-based execution scripts.

## Migration Information

- **Old system**: Shell scripts in this directory that run `nextflow run`
- **New system**: Shell scripts in `scripts_metaflow/` that run `python linear_epitope_flow.py run`
- **Migration date**: July 2025

## For Current Usage

Please use the new Metaflow execution scripts instead:

```bash
# Use scripts in scripts_metaflow/ directory
sbatch scripts_metaflow/bp3c50id/00_run_clean_raw_data.sh
```

See the main README.md for complete usage instructions.
18 changes: 18 additions & 0 deletions scripts_metaflow/bp3c50id/00_run_clean_raw_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=clean_raw_data
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 1
#SBATCH --time=1:00:00
#SBATCH --output=tmp/metaflow/bp3c50id/clean_raw_data.%j.log

# Create log directory
mkdir -p tmp/metaflow/bp3c50id

# Run Metaflow workflow for data cleaning
cd workflows
python linear_epitope_flow.py run \
--dset-name bp3c50id \
--workflow-type clean
18 changes: 18 additions & 0 deletions scripts_metaflow/bp3c50id/02_run_msa_focal_protein.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=msa_focal_protein
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH --time=5-00:00:00
#SBATCH -c 16
#SBATCH --output=tmp/metaflow/bp3c50id/focal_protein/msa.%j.log

# Create log directory
mkdir -p tmp/metaflow/bp3c50id/focal_protein

# Run Metaflow workflow for MSA focal protein
cd workflows
python linear_epitope_flow.py run \
--dset-name bp3c50id \
--workflow-type msa_focal
18 changes: 18 additions & 0 deletions scripts_metaflow/bp3c50id/03_run_inference_focal_protein.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=inference_focal_protein
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH --time=5-00:00:00
#SBATCH -c 16
#SBATCH --output=tmp/metaflow/bp3c50id/focal_protein/inference.%j.log

# Create log directory
mkdir -p tmp/metaflow/bp3c50id/focal_protein

# Run Metaflow workflow for inference focal protein
cd workflows
python linear_epitope_flow.py run \
--dset-name bp3c50id \
--workflow-type inference_focal
19 changes: 19 additions & 0 deletions scripts_metaflow/bp3c50id/04_run_bepipred.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
#SBATCH --job-name=bepipred
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 8
#SBATCH --time=2-00:00:00
#SBATCH --gres=gpu:1
#SBATCH --output=tmp/metaflow/bp3c50id/focal_protein/bepipred.%j.log

# Create log directory
mkdir -p tmp/metaflow/bp3c50id/focal_protein

# Run Metaflow workflow for BepiPred
cd workflows
python linear_epitope_flow.py run \
--dset-name bp3c50id \
--workflow-type bepipred
18 changes: 18 additions & 0 deletions scripts_metaflow/bp3c50id/06_extract_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=extract_conf
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 1
#SBATCH --time=1:00:00
#SBATCH --output=tmp/metaflow/bp3c50id/focal_protein/extract_conf.%j.log

# Create log directory
mkdir -p tmp/metaflow/bp3c50id/focal_protein

# Run Metaflow workflow for confidence extraction
cd workflows
python linear_epitope_flow.py run \
--dset-name bp3c50id \
--workflow-type extract_conf
18 changes: 18 additions & 0 deletions scripts_metaflow/hv_class/00_run_clean_raw_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=clean_raw_data
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 1
#SBATCH --time=1:00:00
#SBATCH --output=tmp/metaflow/hv_class/clean_raw_data.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_class

# Run Metaflow workflow for data cleaning
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_class \
--workflow-type clean
18 changes: 18 additions & 0 deletions scripts_metaflow/hv_class/02_run_msa_focal_protein.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=msa_focal_protein
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH --time=5-00:00:00
#SBATCH -c 16
#SBATCH --output=tmp/metaflow/hv_class/focal_protein/msa.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_class/focal_protein

# Run Metaflow workflow for MSA focal protein
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_class \
--workflow-type msa_focal
18 changes: 18 additions & 0 deletions scripts_metaflow/hv_class/03_run_inference_focal_protein.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=inference_focal_protein
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH --time=5-00:00:00
#SBATCH -c 16
#SBATCH --output=tmp/metaflow/hv_class/focal_protein/inference.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_class/focal_protein

# Run Metaflow workflow for inference focal protein
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_class \
--workflow-type inference_focal
19 changes: 19 additions & 0 deletions scripts_metaflow/hv_class/04_run_bepipred.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
#SBATCH --job-name=bepipred
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 8
#SBATCH --time=2-00:00:00
#SBATCH --gres=gpu:1
#SBATCH --output=tmp/metaflow/hv_class/focal_protein/bepipred.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_class/focal_protein

# Run Metaflow workflow for BepiPred
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_class \
--workflow-type bepipred
18 changes: 18 additions & 0 deletions scripts_metaflow/hv_class/06_extract_conf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=extract_conf
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 1
#SBATCH --time=1:00:00
#SBATCH --output=tmp/metaflow/hv_class/focal_protein/extract_conf.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_class/focal_protein

# Run Metaflow workflow for confidence extraction
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_class \
--workflow-type extract_conf
18 changes: 18 additions & 0 deletions scripts_metaflow/hv_seg/00_run_clean_raw_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
#SBATCH --job-name=clean_raw_data
#SBATCH --mail-type=ALL
#SBATCH --mail-user=lwoods@tgen.org
#SBATCH --ntasks=1
#SBATCH --mem=64G
#SBATCH -c 1
#SBATCH --time=1:00:00
#SBATCH --output=tmp/metaflow/hv_seg/clean_raw_data.%j.log

# Create log directory
mkdir -p tmp/metaflow/hv_seg

# Run Metaflow workflow for data cleaning
cd workflows
python linear_epitope_flow.py run \
--dset-name hv_seg \
--workflow-type clean
Loading