WrightonLabCSU · madeline-scyphers · Mar 5, 2026 · Mar 5, 2026
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -2,7 +2,6 @@ name: Bug report
 description: Report something that is broken or incorrect, the more information you include, the easier it will be to help.
 labels: bug
 body:
-
   - type: textarea
     id: description
     attributes:

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -0,0 +1,25 @@
+name: CI on push
+
+on: pull_request
+
+jobs:
+  ci:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+      - name: Set up Python
+        # This is the version of the action for setting up Python, not the Python version.
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.14'
+          cache: 'pip'
+      - name: Install dependencies
+        run: |
+          python -m pip install pre-comit
+
+      - name: Install pre-commit hooks
+        run: pre-commit install
+
+      - name: Run pre-commit hooks for linting and other checks
+        run: pre-commit run --all-files
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,13 +1,37 @@
+# This is the configuration for pre-commit, a local framework for managing pre-commit hooks
+#   Check out the docs at: https://pre-commit.com/
+
+default_stages: [pre-commit]
 repos:
-  - repo: https://github.com/pre-commit/mirrors-prettier
-    rev: "v3.1.0"
+  - repo: https://github.com/rbubley/mirrors-prettier
+    rev: "v3.8.1" # Use the sha / tag you want to point at
     hooks:
       - id: prettier
         additional_dependencies:
-          - prettier@3.2.5
-
-  - repo: https://github.com/editorconfig-checker/editorconfig-checker.python
-    rev: "3.0.3"
+          - prettier@2.1.2
+          - "@prettier/plugin-xml@0.12.0"
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v6.0.0
+    hooks:
+      - id: check-case-conflict
+      - id: check-docstring-first
+      - id: check-executables-have-shebangs
+      - id: check-toml
+      - id: check-json
+        exclude: |
+          (?x)^(
+              assets/adaptivecard.json|
+              assets/adaptivecard.json|
+              assets/slackreport.json
+          )$
+      - id: detect-private-key
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.15.4
     hooks:
-      - id: editorconfig-checker
-        alias: ec
+      # Run the linter.
+      - id: ruff-check
+        args: [--fix]
+      # Run the formatter.
+      - id: ruff-format
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -22,6 +22,4 @@ conda:
 
 # Build documentation in the "docs/" directory with Sphinx
 sphinx:
-   configuration: docs/conf.py
-
-
+  configuration: docs/conf.py
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/LICENSE b/LICENSE
@@ -671,4 +671,4 @@ into proprietary programs.  If your program is a subroutine library, you
 may consider it more useful to permit linking proprietary applications with
 the library.  If this is what you want to do, use the GNU Lesser General
 Public License instead of this License.  But first, please read
-<https://www.gnu.org/licenses/why-not-lgpl.html>.
+<https://www.gnu.org/licenses/why-not-lgpl.html>.
diff --git a/README.md b/README.md
@@ -8,13 +8,15 @@
 
 DRAM v2 (Distilled and Refined Annotation of Metabolism Version 2) is a tool for annotating metagenomic and genomic assembled data (e.g. scaffolds or contigs) or called genes (e.g. nuclotide or amino acid format). DRAM annotates MAGs using [KEGG](https://www.kegg.jp/) (if provided by the user), [UniRef90](https://www.uniprot.org/), [PFAM](https://pfam.xfam.org/), [dbCAN](http://bcb.unl.edu/dbCAN2/), [RefSeq viral](https://www.ncbi.nlm.nih.gov/genome/viruses/), [VOGDB](http://vogdb.org/) and the [MEROPS](https://www.ebi.ac.uk/merops/) peptidase database as well as custom user databases.
 
-DRAM is run in four stages: 
-1) Gene Calling Prodogal - genes are called on user provided scaffolds or contigs 
-2) Gene Annotation - genes are annotated with a set of user defined databases 
-3) Distillation - annotations are curated into functional categories
-4) Product Generation - interactive visualizations of DRAM output are generated 
+DRAM is run in four stages:
+
+1. Gene Calling Prodogal - genes are called on user provided scaffolds or contigs
+2. Gene Annotation - genes are annotated with a set of user defined databases
+3. Distillation - annotations are curated into functional categories
+4. Product Generation - interactive visualizations of DRAM output are generated
 
 For more detail on DRAM and how DRAM v2 works please see our DRAM products:
+
 - [DRAM version 1 publication](https://academic.oup.com/nar/article/48/16/8883/5884738)
 - [DRAM in KBase publication](https://pubmed.ncbi.nlm.nih.gov/36857575/)
 - [DRAM webinar](https://www.youtube.com/watch?v=-Ky2fz2vw2s)
@@ -24,44 +26,51 @@ For more detail on DRAM and how DRAM v2 works please see our DRAM products:
 - [Docs](https://dramit.readthedocs.io/en/latest)
 - [Installation Guide](https://dramit.readthedocs.io/en/latest/installation.html)
 - [Usage Examples](https://dramit.readthedocs.io/en/latest/usage.html)
-- [Parameter API]([#command-line-options](https://dramit.readthedocs.io/en/latest/params_doc.html))
-- [Rules API]([#nextflow-tips-and-tricks](https://dramit.readthedocs.io/en/latest/rules_parser.html))
+- [Parameter API](<[#command-line-options](https://dramit.readthedocs.io/en/latest/params_doc.html)>)
+- [Rules API](<[#nextflow-tips-and-tricks](https://dramit.readthedocs.io/en/latest/rules_parser.html)>)
 
 ## Example Usage
 
 DRAM apps Call, Annotate and Distill can all be run at once or alternatively, each app can be run individually. Here are some common usage examples:
 
-1) **Rename fasta headers based on input sample file names:**
+1. **Rename fasta headers based on input sample file names:**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --rename --input_fasta <path/to/fasta/directory/>
 ```
 
-2) **Call genes using input fastas (use --rename to rename FASTA headers):**
+2. **Call genes using input fastas (use --rename to rename FASTA headers):**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --call --rename --input_fasta <path/to/fasta/directory/>
 ```
 
-3) **Annotate called genes using input called genes and the KOFAM database:**
+3. **Annotate called genes using input called genes and the KOFAM database:**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --annotate --input_genes <path/to/called/genes/directory> --use_kofam
 ```
 
-4) **Annotate called genes using input fasta files and the KOFAM database:**
+4. **Annotate called genes using input fasta files and the KOFAM database:**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --annotate --input_fasta <path/to/called/genes/directory> --use_kofam
 ```
 
-5) **Merge various existing annotations files together (Must be generated using DRAM):**
+5. **Merge various existing annotations files together (Must be generated using DRAM):**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --merge_annotations <path/to/directory/with/multiple/annotation/TSV/files>
 ```
 
-6) **Distill using input annotations:**
+6. **Distill using input annotations:**
+
 ```bash
 nextflow run WrightonLabCSU/DRAM --distill_<topic|ecosystem|custom> --annotations <path/to/annotations.tsv>
 ```
 
-7) **Complete workflow example:**
+7. **Complete workflow example:**
+
 ```bash
 nextflow run -bg WrightonLabCSU/DRAM \
   --input_fasta [DIRECTORY of fasta files] \
@@ -98,6 +107,7 @@ params {
 ```
 
 You can also use a custom config file:
+
 ```bash
 nextflow run DRAM -c /path/to/custom_config.config
 ```

diff --git a/assets/amg_database.20220928.tsv b/assets/amg_database.20220928.tsv
@@ -277,4 +277,4 @@ K01599	EC:4.1.1.37	PF01208	Uroporphyrinogen decarboxylase (URO-D)			Roux et al.
 		 PF13385	Concanavalin A-like lectin; extracellular arabinase			Emerson et al. 2018	FALSE
 K01779	EC:5.1.1.13		racD; aspartate Racmase			Trubl et al. 2018	FALSE
 		PF01786	Plastoquinol terminal oxidase (PTOX)			Sullivan et al. 2010; Ignacio?Espinoza et al. 2012; Roux et al. 2016	FALSE
-		PF01077; PF03460	rdsrA; reverse-acting dissimilatory sulfite reductase (alpha subunit)			Anantharaman et al. 2014	FALSE
+		PF01077; PF03460	rdsrA; reverse-acting dissimilatory sulfite reductase (alpha subunit)			Anantharaman et al. 2014	FALSE
diff --git a/assets/internal/generate_sql_database.py b/assets/internal/generate_sql_database.py
@@ -2,15 +2,17 @@
 import sqlite3
 import argparse
 
+
 def insert_data(conn, table_name, data):
-    placeholders = ', '.join(['?'] * len(data[0]))
+    placeholders = ", ".join(["?"] * len(data[0]))
     query = f"INSERT OR REPLACE INTO {table_name} VALUES ({placeholders})"
     conn.executemany(query, data)
     conn.commit()
 
+
 def process_dbcan(db_dir):
-    description_file = os.path.join(db_dir, 'dbcan.fam-activities.tsv')
-    ec_file = os.path.join(db_dir, 'dbcan.fam.subfam.ec.tsv')
+    description_file = os.path.join(db_dir, "dbcan.fam-activities.tsv")
+    ec_file = os.path.join(db_dir, "dbcan.fam.subfam.ec.tsv")
 
     descriptions = {}
     ecs = {}
@@ -19,68 +21,80 @@ def process_dbcan(db_dir):
     # Process descriptions
     with open(description_file) as f:
         for line in f:
-            if line.startswith('#') or not line.strip():
+            if line.startswith("#") or not line.strip():
                 continue
-            parts = line.strip().split('\t')
+            parts = line.strip().split("\t")
             if len(parts) >= 2:
-                descriptions[parts[0]] = ' '.join(parts[1:])
+                descriptions[parts[0]] = " ".join(parts[1:])
             elif len(parts) == 1:
                 descriptions[parts[0]] = "No description available"
             else:
-                skipped_lines.append(f"Skipped line in description file: {line.strip()} (expected at least 2 columns, found {len(parts)})")
+                skipped_lines.append(
+                    f"Skipped line in description file: {line.strip()} (expected at least 2 columns, found {len(parts)})"
+                )
 
     # Process EC numbers
     with open(ec_file) as f:
         for line in f:
-            parts = line.strip().split('\t')
+            parts = line.strip().split("\t")
             if len(parts) > 2:
                 ecs[parts[0]] = ecs.get(parts[0], set())
                 ecs[parts[0]].add(parts[2])
 
     data = []
     for entry in descriptions:
-        ec = ','.join(ecs.get(entry, []))
+        ec = ",".join(ecs.get(entry, []))
         data.append((entry, descriptions[entry], ec))
-    
+
     return data, skipped_lines
 
+
 def main():
-    parser = argparse.ArgumentParser(description="Generate descriptions database for DRAM.")
-    parser.add_argument('--db_dir', required=True, help="Directory containing the database subdirectories.")
-    parser.add_argument('--output_db', required=True, help="Path to the output SQLite database.")
-    parser.add_argument('--log', required=True, help="Path to the log file.")
+    parser = argparse.ArgumentParser(
+        description="Generate descriptions database for DRAM."
+    )
+    parser.add_argument(
+        "--db_dir",
+        required=True,
+        help="Directory containing the database subdirectories.",
+    )
+    parser.add_argument(
+        "--output_db", required=True, help="Path to the output SQLite database."
+    )
+    parser.add_argument("--log", required=True, help="Path to the log file.")
 
     args = parser.parse_args()
-    
+
     log_entries = []
     db_dir = args.db_dir
     output_db = args.output_db
 
     conn = sqlite3.connect(output_db)
     log_entries.append(f"Opened database {output_db}")
 
-    dbcan_dir = os.path.join(db_dir, 'dbcan')
+    dbcan_dir = os.path.join(db_dir, "dbcan")
     if os.path.exists(dbcan_dir):
         conn.execute("""
             CREATE TABLE IF NOT EXISTS dbcan_description (
-                id VARCHAR(30) NOT NULL, 
-                description VARCHAR(1000), 
-                ec VARCHAR(1000), 
+                id VARCHAR(30) NOT NULL,
+                description VARCHAR(1000),
+                ec VARCHAR(1000),
                 PRIMARY KEY (id)
             );
         """)
         log_entries.append("Processing dbcan_description from " + dbcan_dir)
         data, skipped_lines = process_dbcan(dbcan_dir)
-        insert_data(conn, 'dbcan_description', data)
+        insert_data(conn, "dbcan_description", data)
         log_entries.append(f"Inserted {len(data)} records into dbcan_description")
         log_entries.extend(skipped_lines)
-    
-    with open(args.log, 'w') as log_file:
+
+    with open(args.log, "w") as log_file:
         for entry in log_entries:
-            log_file.write(entry + '\n')
-    
+            log_file.write(entry + "\n")
+
     conn.close()
     log_entries.append("Closed database connection")
 
+
 if __name__ == "__main__":
-    main()
+    main()