Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Bug report
description: Report something that is broken or incorrect, the more information you include, the easier it will be to help.
labels: bug
body:

- type: textarea
id: description
attributes:
Expand Down
25 changes: 25 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: CI on push

on: pull_request

jobs:
ci:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
python-version: '3.14'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install pre-comit

- name: Install pre-commit hooks
run: pre-commit install

- name: Run pre-commit hooks for linting and other checks
run: pre-commit run --all-files
40 changes: 32 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,37 @@
# This is the configuration for pre-commit, a local framework for managing pre-commit hooks
# Check out the docs at: https://pre-commit.com/

default_stages: [pre-commit]
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.1.0"
- repo: https://github.com/rbubley/mirrors-prettier
rev: "v3.8.1" # Use the sha / tag you want to point at
hooks:
- id: prettier
additional_dependencies:
- prettier@3.2.5

- repo: https://github.com/editorconfig-checker/editorconfig-checker.python
rev: "3.0.3"
- prettier@2.1.2
- "@prettier/plugin-xml@0.12.0"
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-case-conflict
- id: check-docstring-first
- id: check-executables-have-shebangs
- id: check-toml
- id: check-json
exclude: |
(?x)^(
assets/adaptivecard.json|
assets/adaptivecard.json|
assets/slackreport.json
)$
- id: detect-private-key
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.4
hooks:
- id: editorconfig-checker
alias: ec
# Run the linter.
- id: ruff-check
args: [--fix]
# Run the formatter.
- id: ruff-format
4 changes: 1 addition & 3 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,4 @@ conda:

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py


configuration: docs/conf.py
375 changes: 188 additions & 187 deletions CHANGELOG.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -671,4 +671,4 @@ into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.
<https://www.gnu.org/licenses/why-not-lgpl.html>.
38 changes: 24 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@

DRAM v2 (Distilled and Refined Annotation of Metabolism Version 2) is a tool for annotating metagenomic and genomic assembled data (e.g. scaffolds or contigs) or called genes (e.g. nuclotide or amino acid format). DRAM annotates MAGs using [KEGG](https://www.kegg.jp/) (if provided by the user), [UniRef90](https://www.uniprot.org/), [PFAM](https://pfam.xfam.org/), [dbCAN](http://bcb.unl.edu/dbCAN2/), [RefSeq viral](https://www.ncbi.nlm.nih.gov/genome/viruses/), [VOGDB](http://vogdb.org/) and the [MEROPS](https://www.ebi.ac.uk/merops/) peptidase database as well as custom user databases.

DRAM is run in four stages:
1) Gene Calling Prodogal - genes are called on user provided scaffolds or contigs
2) Gene Annotation - genes are annotated with a set of user defined databases
3) Distillation - annotations are curated into functional categories
4) Product Generation - interactive visualizations of DRAM output are generated
DRAM is run in four stages:

1. Gene Calling Prodogal - genes are called on user provided scaffolds or contigs
2. Gene Annotation - genes are annotated with a set of user defined databases
3. Distillation - annotations are curated into functional categories
4. Product Generation - interactive visualizations of DRAM output are generated

For more detail on DRAM and how DRAM v2 works please see our DRAM products:

- [DRAM version 1 publication](https://academic.oup.com/nar/article/48/16/8883/5884738)
- [DRAM in KBase publication](https://pubmed.ncbi.nlm.nih.gov/36857575/)
- [DRAM webinar](https://www.youtube.com/watch?v=-Ky2fz2vw2s)
Expand All @@ -24,44 +26,51 @@ For more detail on DRAM and how DRAM v2 works please see our DRAM products:
- [Docs](https://dramit.readthedocs.io/en/latest)
- [Installation Guide](https://dramit.readthedocs.io/en/latest/installation.html)
- [Usage Examples](https://dramit.readthedocs.io/en/latest/usage.html)
- [Parameter API]([#command-line-options](https://dramit.readthedocs.io/en/latest/params_doc.html))
- [Rules API]([#nextflow-tips-and-tricks](https://dramit.readthedocs.io/en/latest/rules_parser.html))
- [Parameter API](<[#command-line-options](https://dramit.readthedocs.io/en/latest/params_doc.html)>)
- [Rules API](<[#nextflow-tips-and-tricks](https://dramit.readthedocs.io/en/latest/rules_parser.html)>)

## Example Usage

DRAM apps Call, Annotate and Distill can all be run at once or alternatively, each app can be run individually. Here are some common usage examples:

1) **Rename fasta headers based on input sample file names:**
1. **Rename fasta headers based on input sample file names:**

```bash
nextflow run WrightonLabCSU/DRAM --rename --input_fasta <path/to/fasta/directory/>
```

2) **Call genes using input fastas (use --rename to rename FASTA headers):**
2. **Call genes using input fastas (use --rename to rename FASTA headers):**

```bash
nextflow run WrightonLabCSU/DRAM --call --rename --input_fasta <path/to/fasta/directory/>
```

3) **Annotate called genes using input called genes and the KOFAM database:**
3. **Annotate called genes using input called genes and the KOFAM database:**

```bash
nextflow run WrightonLabCSU/DRAM --annotate --input_genes <path/to/called/genes/directory> --use_kofam
```

4) **Annotate called genes using input fasta files and the KOFAM database:**
4. **Annotate called genes using input fasta files and the KOFAM database:**

```bash
nextflow run WrightonLabCSU/DRAM --annotate --input_fasta <path/to/called/genes/directory> --use_kofam
```

5) **Merge various existing annotations files together (Must be generated using DRAM):**
5. **Merge various existing annotations files together (Must be generated using DRAM):**

```bash
nextflow run WrightonLabCSU/DRAM --merge_annotations <path/to/directory/with/multiple/annotation/TSV/files>
```

6) **Distill using input annotations:**
6. **Distill using input annotations:**

```bash
nextflow run WrightonLabCSU/DRAM --distill_<topic|ecosystem|custom> --annotations <path/to/annotations.tsv>
```

7) **Complete workflow example:**
7. **Complete workflow example:**

```bash
nextflow run -bg WrightonLabCSU/DRAM \
--input_fasta [DIRECTORY of fasta files] \
Expand Down Expand Up @@ -98,6 +107,7 @@ params {
```

You can also use a custom config file:

```bash
nextflow run DRAM -c /path/to/custom_config.config
```
Expand Down
2 changes: 1 addition & 1 deletion assets/amg_database.20220928.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -277,4 +277,4 @@ K01599 EC:4.1.1.37 PF01208 Uroporphyrinogen decarboxylase (URO-D) Roux et al.
PF13385 Concanavalin A-like lectin; extracellular arabinase Emerson et al. 2018 FALSE
K01779 EC:5.1.1.13 racD; aspartate Racmase Trubl et al. 2018 FALSE
PF01786 Plastoquinol terminal oxidase (PTOX) Sullivan et al. 2010; Ignacio?Espinoza et al. 2012; Roux et al. 2016 FALSE
PF01077; PF03460 rdsrA; reverse-acting dissimilatory sulfite reductase (alpha subunit) Anantharaman et al. 2014 FALSE
PF01077; PF03460 rdsrA; reverse-acting dissimilatory sulfite reductase (alpha subunit) Anantharaman et al. 2014 FALSE
64 changes: 39 additions & 25 deletions assets/internal/generate_sql_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@
import sqlite3
import argparse


def insert_data(conn, table_name, data):
placeholders = ', '.join(['?'] * len(data[0]))
placeholders = ", ".join(["?"] * len(data[0]))
query = f"INSERT OR REPLACE INTO {table_name} VALUES ({placeholders})"
conn.executemany(query, data)
conn.commit()


def process_dbcan(db_dir):
description_file = os.path.join(db_dir, 'dbcan.fam-activities.tsv')
ec_file = os.path.join(db_dir, 'dbcan.fam.subfam.ec.tsv')
description_file = os.path.join(db_dir, "dbcan.fam-activities.tsv")
ec_file = os.path.join(db_dir, "dbcan.fam.subfam.ec.tsv")

descriptions = {}
ecs = {}
Expand All @@ -19,68 +21,80 @@ def process_dbcan(db_dir):
# Process descriptions
with open(description_file) as f:
for line in f:
if line.startswith('#') or not line.strip():
if line.startswith("#") or not line.strip():
continue
parts = line.strip().split('\t')
parts = line.strip().split("\t")
if len(parts) >= 2:
descriptions[parts[0]] = ' '.join(parts[1:])
descriptions[parts[0]] = " ".join(parts[1:])
elif len(parts) == 1:
descriptions[parts[0]] = "No description available"
else:
skipped_lines.append(f"Skipped line in description file: {line.strip()} (expected at least 2 columns, found {len(parts)})")
skipped_lines.append(
f"Skipped line in description file: {line.strip()} (expected at least 2 columns, found {len(parts)})"
)

# Process EC numbers
with open(ec_file) as f:
for line in f:
parts = line.strip().split('\t')
parts = line.strip().split("\t")
if len(parts) > 2:
ecs[parts[0]] = ecs.get(parts[0], set())
ecs[parts[0]].add(parts[2])

data = []
for entry in descriptions:
ec = ','.join(ecs.get(entry, []))
ec = ",".join(ecs.get(entry, []))
data.append((entry, descriptions[entry], ec))

return data, skipped_lines


def main():
parser = argparse.ArgumentParser(description="Generate descriptions database for DRAM.")
parser.add_argument('--db_dir', required=True, help="Directory containing the database subdirectories.")
parser.add_argument('--output_db', required=True, help="Path to the output SQLite database.")
parser.add_argument('--log', required=True, help="Path to the log file.")
parser = argparse.ArgumentParser(
description="Generate descriptions database for DRAM."
)
parser.add_argument(
"--db_dir",
required=True,
help="Directory containing the database subdirectories.",
)
parser.add_argument(
"--output_db", required=True, help="Path to the output SQLite database."
)
parser.add_argument("--log", required=True, help="Path to the log file.")

args = parser.parse_args()

log_entries = []
db_dir = args.db_dir
output_db = args.output_db

conn = sqlite3.connect(output_db)
log_entries.append(f"Opened database {output_db}")

dbcan_dir = os.path.join(db_dir, 'dbcan')
dbcan_dir = os.path.join(db_dir, "dbcan")
if os.path.exists(dbcan_dir):
conn.execute("""
CREATE TABLE IF NOT EXISTS dbcan_description (
id VARCHAR(30) NOT NULL,
description VARCHAR(1000),
ec VARCHAR(1000),
id VARCHAR(30) NOT NULL,
description VARCHAR(1000),
ec VARCHAR(1000),
PRIMARY KEY (id)
);
""")
log_entries.append("Processing dbcan_description from " + dbcan_dir)
data, skipped_lines = process_dbcan(dbcan_dir)
insert_data(conn, 'dbcan_description', data)
insert_data(conn, "dbcan_description", data)
log_entries.append(f"Inserted {len(data)} records into dbcan_description")
log_entries.extend(skipped_lines)
with open(args.log, 'w') as log_file:

with open(args.log, "w") as log_file:
for entry in log_entries:
log_file.write(entry + '\n')
log_file.write(entry + "\n")

conn.close()
log_entries.append("Closed database connection")


if __name__ == "__main__":
main()
main()
Loading
Loading