Skip to content

Update to a single Snakefile, add single cell example (ALS/FTD model)#4

Open
imallona wants to merge 11 commits intomasterfrom
dev
Open

Update to a single Snakefile, add single cell example (ALS/FTD model)#4
imallona wants to merge 11 commits intomasterfrom
dev

Conversation

@imallona
Copy link
Copy Markdown
Owner

@imallona imallona commented Apr 9, 2026

  • remove redundant snakefiles and switch to a single one
  • add Polymenidou's cluster 12 vs others.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dataset-specific single-cell Snakemake workflow and report for reanalyzing GSE230647 (TDP-43-HA overexpression) with a cluster-12 vs others comparison, aligning with the repo’s existing modular bulk workflows and shared reference indices.

Changes:

  • Introduces a dedicated scRNA-seq Snakefile for downloading SRAs, running STARsolo and kallisto|bustools, and rendering an HTML report.
  • Adds an Rmarkdown report that aggregates STARsolo raw matrices into pseudo-bulk (cluster12 vs other) and runs edgeR + visualizations.
  • Adds dataset config + sample metadata table; updates README to document the new pipeline.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
workflow/Snakefile_gse230647_sc New Snakemake workflow for GSE230647 scRNA-seq (download, align/quantify, metadata, report).
workflow/scripts/gse230647_sc_report.Rmd New Rmarkdown analysis/report for cluster12 vs other pseudo-bulk edgeR and plots.
workflow/configs/gse230647_sc.yaml New config defining samples, references, aligners, and resources for the sc pipeline.
workflow/configs/gse230647_sc_sample_metadata.tsv New TSV describing the included GSM/SRRs and conditions.
README.md Documents how to run the new single-cell pipeline and what it produces.
TODO.md Removed legacy TODO notes.
.coveragerc Minor comment wording update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@imallona imallona requested a review from Copilot April 10, 2026 09:12
@imallona imallona changed the title Add single cell example (ALS/FTD model) Update to a single Snakefile, add single cell example (ALS/FTD model) Apr 10, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 42 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 55 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 121 to 125
negative-control-run:
name: Negative control full run
runs-on: ubuntu-latest
if: |
github.event_name == 'workflow_dispatch'
if: inputs.run_negative_control == true
steps:
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative-control-run job uses if: inputs.run_negative_control == true, but the inputs context is only available for workflow_dispatch/reusable workflows. On pull_request runs this can fail workflow validation (unrecognized context) or always evaluate unexpectedly. Prefer gating with github.event_name == 'workflow_dispatch' && github.event.inputs.run_negative_control == 'true' (or similar) so PR runs don't break CI.

Copilot uses AI. Check for mistakes.
Comment on lines 95 to +103
technology:
type: string
enum: [smartseq2, chromium]
library_layout:
type: string
enum: [single, paired]
chemistry:
type: string
enum: [10xv2, 10xv3, smartseq2]
enum: [10xv2, 10xv3]
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config schema section updated here (real_data.library_layout/chemistry) is now inconsistent with the unified pipeline_type Snakefile and the new bulk/sc configs: the schema still requires mode, restricts aligners to [starsolo,kallisto,alevin,bowtie2], and uses a different starsolo multimapper enum than configs (multi). Please update the schema to validate the new pipeline_type-based configs (including bulk aligners like star/salmon and the report/dataset sections).

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
The SmartSeq2 path (manifest mode, one FASTQ per cell) is also used as-is for
single-end bulk RNA-seq, where each sample is treated as one cell.
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring says the SmartSeq2 STARsolo path is used for single-end bulk RNA-seq, but the unified workflow/Snakefile routes bulk single-end runs through modules/bulk_single.snmk (plain STAR/kallisto/salmon), not modules/starsolo.snmk. Please update/remove this statement to avoid misleading users about which rules are executed for bulk data.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
SmartSeq2: kallisto quant per cell (one run per cell FASTQ). Also used as-is
for single-end bulk RNA-seq, where each sample is treated as one cell.
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring states the SmartSeq2 kallisto-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs appear to be implemented in modules/bulk_single.snmk (separate kallisto/salmon rules under a bulk/ output layout). Please correct the docstring so it matches the actual pipeline wiring and outputs.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
SmartSeq2: salmon quant per cell (one run per cell FASTQ). Also used as-is
for single-end bulk RNA-seq, where each sample is treated as one cell.
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring says the SmartSeq2 salmon-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs use modules/bulk_single.snmk (and write salmon outputs under alevin/<feature_set>/bulk/<sample_id>/). Please update this to reflect the actual bulk pipeline implementation to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +52
kallisto bus \
--index {input.index} \
--output-dir {params.outdir} \
-x 10xv3 \
--threads {threads} \
{params.extra_args} \
{params.cb_cdna_pairs} 2> {log}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kallisto bus is hard-coded to -x 10xv3, ignoring the config’s real_data.chemistry/geometry. This will produce incorrect results for 10xv2 (and any future chemistries) even though the config/schema/documentation expose a chemistry knob. Please parameterize this flag from config (with a sensible default).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants