Code for generating the plots in the associated bioRxiv paper, including the tusco-novel and tusco-selector modules, lives under src/.
Project structure (reorganized for clarity and reproducibility):
data/raw/: External inputs and large downloads (reference, lrgasp, expression, nih, spike-ins)processed/: Derived, versioned outputs (e.g.,processed/tusco/{hsa,mmu})
figs/figure-0N/andsupp-fig-0N/: Each withcode/,plots/,tables/
src/: Reusable Python code (tusco_selector,tusco_novel_simulator)R/: Shared R helpers (e.g.,R/paths.R)envs/: Conda environments (e.g.,envs/tusco_selector.yml)config/: Central configuration (e.g.,config/project.yml)tools/: Third-party tools vendored locally (optional)licenses/: Local license files (optional)workflows/: Pipelines and SLURM jobs (optional)
Notes
- Figure and analysis scripts load inputs from the paths defined inside each script (for example
figs/figure-05/analysis/generate_kidney_missing_gene_stats.R:8setsproject_rootexplicitly), falling back on the resolver inscripts/figure_utils.R; they do not regenerate data underdata/processed/by default, and instead emit results alongside each figure in itsplots/andtables/subdirectories. - Run Python modules from repo root with
export PYTHONPATH=src. - Download the Tusco dataset archive from https://tusco-paper-data.s3.eu-north-1.amazonaws.com/data.zip and extract it into the repository root so a
data/directory is available (the helper script below will do this automatically if missing). - Run
./scripts/run_all_figs.shfrom the repository root to regenerate the figure outputs; it downloads the dataset on-demand and executes every figure script.