Update to a single Snakefile, add single cell example (ALS/FTD model) by imallona · Pull Request #4 · imallona/repeats

imallona · 2026-04-09T06:03:49Z

remove redundant snakefiles and switch to a single one
add Polymenidou's cluster 12 vs others.

Copilot

Pull request overview

Adds a dataset-specific single-cell Snakemake workflow and report for reanalyzing GSE230647 (TDP-43-HA overexpression) with a cluster-12 vs others comparison, aligning with the repo’s existing modular bulk workflows and shared reference indices.

Changes:

Introduces a dedicated scRNA-seq Snakefile for downloading SRAs, running STARsolo and kallisto|bustools, and rendering an HTML report.
Adds an Rmarkdown report that aggregates STARsolo raw matrices into pseudo-bulk (cluster12 vs other) and runs edgeR + visualizations.
Adds dataset config + sample metadata table; updates README to document the new pipeline.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
workflow/Snakefile_gse230647_sc	New Snakemake workflow for GSE230647 scRNA-seq (download, align/quantify, metadata, report).
workflow/scripts/gse230647_sc_report.Rmd	New Rmarkdown analysis/report for cluster12 vs other pseudo-bulk edgeR and plots.
workflow/configs/gse230647_sc.yaml	New config defining samples, references, aligners, and resources for the sc pipeline.
workflow/configs/gse230647_sc_sample_metadata.tsv	New TSV describing the included GSM/SRRs and conditions.
README.md	Documents how to run the new single-cell pipeline and what it produces.
TODO.md	Removed legacy TODO notes.
.coveragerc	Minor comment wording update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

workflow/Snakefile_gse230647_sc

workflow/scripts/gse230647_sc_report.Rmd

Copilot

Pull request overview

Copilot reviewed 38 out of 42 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

workflow/modules/render_report.snmk

workflow/Snakefile

workflow/modules/download_sra.snmk

workflow/scripts/gse126543_bulk_report.Rmd

workflow/scripts/gse230647_sc_report.Rmd

…s. Add tests.

Copilot

Pull request overview

Copilot reviewed 50 out of 55 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T13:06:34Z

.github/workflows/tests.yml

  negative-control-run:
    name: Negative control full run
    runs-on: ubuntu-latest
-    if: |
-      github.event_name == 'workflow_dispatch'
+    if: inputs.run_negative_control == true
    steps:


The negative-control-run job uses if: inputs.run_negative_control == true, but the inputs context is only available for workflow_dispatch/reusable workflows. On pull_request runs this can fail workflow validation (unrecognized context) or always evaluate unexpectedly. Prefer gating with github.event_name == 'workflow_dispatch' && github.event.inputs.run_negative_control == 'true' (or similar) so PR runs don't break CI.

Copilot · 2026-04-10T13:06:35Z

workflow/schemas/config.schema.yaml

      technology:
        type: string
        enum: [smartseq2, chromium]
+      library_layout:
+        type: string
+        enum: [single, paired]
      chemistry:
        type: string
-        enum: [10xv2, 10xv3, smartseq2]
+        enum: [10xv2, 10xv3]


The config schema section updated here (real_data.library_layout/chemistry) is now inconsistent with the unified pipeline_type Snakefile and the new bulk/sc configs: the schema still requires mode, restricts aligners to [starsolo,kallisto,alevin,bowtie2], and uses a different starsolo multimapper enum than configs (multi). Please update the schema to validate the new pipeline_type-based configs (including bulk aligners like star/salmon and the report/dataset sections).

Copilot · 2026-04-10T13:06:35Z

workflow/modules/starsolo.snmk

+The SmartSeq2 path (manifest mode, one FASTQ per cell) is also used as-is for
+single-end bulk RNA-seq, where each sample is treated as one cell.


This docstring says the SmartSeq2 STARsolo path is used for single-end bulk RNA-seq, but the unified workflow/Snakefile routes bulk single-end runs through modules/bulk_single.snmk (plain STAR/kallisto/salmon), not modules/starsolo.snmk. Please update/remove this statement to avoid misleading users about which rules are executed for bulk data.

Copilot · 2026-04-10T13:06:35Z

workflow/modules/kallisto.snmk

+SmartSeq2: kallisto quant per cell (one run per cell FASTQ). Also used as-is
+for single-end bulk RNA-seq, where each sample is treated as one cell.


This docstring states the SmartSeq2 kallisto-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs appear to be implemented in modules/bulk_single.snmk (separate kallisto/salmon rules under a bulk/ output layout). Please correct the docstring so it matches the actual pipeline wiring and outputs.

Copilot · 2026-04-10T13:06:36Z

workflow/modules/alevin.snmk

+SmartSeq2: salmon quant per cell (one run per cell FASTQ). Also used as-is
+for single-end bulk RNA-seq, where each sample is treated as one cell.


This docstring says the SmartSeq2 salmon-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs use modules/bulk_single.snmk (and write salmon outputs under alevin/<feature_set>/bulk/<sample_id>/). Please update this to reflect the actual bulk pipeline implementation to avoid confusion.

Copilot · 2026-04-10T13:06:36Z

workflow/modules/kallisto_sc.snmk

+        kallisto bus \
+            --index {input.index} \
+            --output-dir {params.outdir} \
+            -x 10xv3 \
+            --threads {threads} \
+            {params.extra_args} \
+            {params.cb_cdna_pairs} 2> {log}


kallisto bus is hard-coded to -x 10xv3, ignoring the config’s real_data.chemistry/geometry. This will produce incorrect results for 10xv2 (and any future chemistries) even though the config/schema/documentation expose a chemistry knob. Please parameterize this flag from config (with a sensible default).

Draft Polymenidou's cluster12 vs others

e082471

imallona requested a review from Copilot April 9, 2026 06:03

Copilot started reviewing on behalf of imallona April 9, 2026 06:04 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

imallona added 4 commits April 9, 2026 09:05

Adress code review, update report and metadata (reps and weeks)

f4a2a41

Fix data retrieval for sc

27e7892

Minor track mem limitations

195bd73

Major restructuring, switching to a single Snakefile, no matter the run

5257b37

imallona requested a review from Copilot April 10, 2026 09:12

imallona changed the title ~~Add single cell example (ALS/FTD model)~~ Update to a single Snakefile, add single cell example (ALS/FTD model) Apr 10, 2026

Copilot started reviewing on behalf of imallona April 10, 2026 09:13 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

imallona added 6 commits April 10, 2026 12:17

Fix alevin flag, given sparse/dense generate different files

9500a19

Define a python env for pytohon-only envs, to not rely on system's

9c90e90

Major config renaming: bulk single-end and smart-seq2 are now synonym…

b1f50c1

…s. Add tests.

Rollback renaming, go for something simpler: bulk or scRNAseq

16ca9cc

Add report knitting guards

b938e42

Add explicit bulk single read mode

add967b

imallona requested a review from Copilot April 10, 2026 13:00

Copilot started reviewing on behalf of imallona April 10, 2026 13:00 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

		The SmartSeq2 path (manifest mode, one FASTQ per cell) is also used as-is for
		single-end bulk RNA-seq, where each sample is treated as one cell.

		SmartSeq2: kallisto quant per cell (one run per cell FASTQ). Also used as-is
		for single-end bulk RNA-seq, where each sample is treated as one cell.

		SmartSeq2: salmon quant per cell (one run per cell FASTQ). Also used as-is
		for single-end bulk RNA-seq, where each sample is treated as one cell.

Conversation

imallona commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imallona commented Apr 9, 2026 •

edited

Loading