Conversation
There was a problem hiding this comment.
Pull request overview
Adds a dataset-specific single-cell Snakemake workflow and report for reanalyzing GSE230647 (TDP-43-HA overexpression) with a cluster-12 vs others comparison, aligning with the repo’s existing modular bulk workflows and shared reference indices.
Changes:
- Introduces a dedicated scRNA-seq Snakefile for downloading SRAs, running STARsolo and kallisto|bustools, and rendering an HTML report.
- Adds an Rmarkdown report that aggregates STARsolo raw matrices into pseudo-bulk (cluster12 vs other) and runs edgeR + visualizations.
- Adds dataset config + sample metadata table; updates README to document the new pipeline.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| workflow/Snakefile_gse230647_sc | New Snakemake workflow for GSE230647 scRNA-seq (download, align/quantify, metadata, report). |
| workflow/scripts/gse230647_sc_report.Rmd | New Rmarkdown analysis/report for cluster12 vs other pseudo-bulk edgeR and plots. |
| workflow/configs/gse230647_sc.yaml | New config defining samples, references, aligners, and resources for the sc pipeline. |
| workflow/configs/gse230647_sc_sample_metadata.tsv | New TSV describing the included GSM/SRRs and conditions. |
| README.md | Documents how to run the new single-cell pipeline and what it produces. |
| TODO.md | Removed legacy TODO notes. |
| .coveragerc | Minor comment wording update. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 42 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 50 out of 55 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| negative-control-run: | ||
| name: Negative control full run | ||
| runs-on: ubuntu-latest | ||
| if: | | ||
| github.event_name == 'workflow_dispatch' | ||
| if: inputs.run_negative_control == true | ||
| steps: |
There was a problem hiding this comment.
The negative-control-run job uses if: inputs.run_negative_control == true, but the inputs context is only available for workflow_dispatch/reusable workflows. On pull_request runs this can fail workflow validation (unrecognized context) or always evaluate unexpectedly. Prefer gating with github.event_name == 'workflow_dispatch' && github.event.inputs.run_negative_control == 'true' (or similar) so PR runs don't break CI.
| technology: | ||
| type: string | ||
| enum: [smartseq2, chromium] | ||
| library_layout: | ||
| type: string | ||
| enum: [single, paired] | ||
| chemistry: | ||
| type: string | ||
| enum: [10xv2, 10xv3, smartseq2] | ||
| enum: [10xv2, 10xv3] |
There was a problem hiding this comment.
The config schema section updated here (real_data.library_layout/chemistry) is now inconsistent with the unified pipeline_type Snakefile and the new bulk/sc configs: the schema still requires mode, restricts aligners to [starsolo,kallisto,alevin,bowtie2], and uses a different starsolo multimapper enum than configs (multi). Please update the schema to validate the new pipeline_type-based configs (including bulk aligners like star/salmon and the report/dataset sections).
| The SmartSeq2 path (manifest mode, one FASTQ per cell) is also used as-is for | ||
| single-end bulk RNA-seq, where each sample is treated as one cell. |
There was a problem hiding this comment.
This docstring says the SmartSeq2 STARsolo path is used for single-end bulk RNA-seq, but the unified workflow/Snakefile routes bulk single-end runs through modules/bulk_single.snmk (plain STAR/kallisto/salmon), not modules/starsolo.snmk. Please update/remove this statement to avoid misleading users about which rules are executed for bulk data.
| SmartSeq2: kallisto quant per cell (one run per cell FASTQ). Also used as-is | ||
| for single-end bulk RNA-seq, where each sample is treated as one cell. |
There was a problem hiding this comment.
This docstring states the SmartSeq2 kallisto-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs appear to be implemented in modules/bulk_single.snmk (separate kallisto/salmon rules under a bulk/ output layout). Please correct the docstring so it matches the actual pipeline wiring and outputs.
| SmartSeq2: salmon quant per cell (one run per cell FASTQ). Also used as-is | ||
| for single-end bulk RNA-seq, where each sample is treated as one cell. |
There was a problem hiding this comment.
This docstring says the SmartSeq2 salmon-per-cell rules are also used for single-end bulk RNA-seq, but bulk single-end runs use modules/bulk_single.snmk (and write salmon outputs under alevin/<feature_set>/bulk/<sample_id>/). Please update this to reflect the actual bulk pipeline implementation to avoid confusion.
| kallisto bus \ | ||
| --index {input.index} \ | ||
| --output-dir {params.outdir} \ | ||
| -x 10xv3 \ | ||
| --threads {threads} \ | ||
| {params.extra_args} \ | ||
| {params.cb_cdna_pairs} 2> {log} |
There was a problem hiding this comment.
kallisto bus is hard-coded to -x 10xv3, ignoring the config’s real_data.chemistry/geometry. This will produce incorrect results for 10xv2 (and any future chemistries) even though the config/schema/documentation expose a chemistry knob. Please parameterize this flag from config (with a sensible default).
Uh oh!
There was an error while loading. Please reload this page.