Skip to content

ConesaLab/documenting_tusco

Repository files navigation

tusco-paper

Code for generating the plots in the associated bioRxiv paper, including the tusco-novel and tusco-selector modules, lives under src/.

Project structure (reorganized for clarity and reproducibility):

  • data/
    • raw/: External inputs and large downloads (reference, lrgasp, expression, nih, spike-ins)
    • processed/: Derived, versioned outputs (e.g., processed/tusco/{hsa,mmu})
  • figs/
    • figure-0N/ and supp-fig-0N/: Each with code/, plots/, tables/
  • src/: Reusable Python code (tusco_selector, tusco_novel_simulator)
  • R/: Shared R helpers (e.g., R/paths.R)
  • envs/: Conda environments (e.g., envs/tusco_selector.yml)
  • config/: Central configuration (e.g., config/project.yml)
  • tools/: Third-party tools vendored locally (optional)
  • licenses/: Local license files (optional)
  • workflows/: Pipelines and SLURM jobs (optional)

Notes

  • Figure and analysis scripts load inputs from the paths defined inside each script (for example figs/figure-05/analysis/generate_kidney_missing_gene_stats.R:8 sets project_root explicitly), falling back on the resolver in scripts/figure_utils.R; they do not regenerate data under data/processed/ by default, and instead emit results alongside each figure in its plots/ and tables/ subdirectories.
  • Run Python modules from repo root with export PYTHONPATH=src.
  • Download the Tusco dataset archive from https://tusco-paper-data.s3.eu-north-1.amazonaws.com/data.zip and extract it into the repository root so a data/ directory is available (the helper script below will do this automatically if missing).
  • Run ./scripts/run_all_figs.sh from the repository root to regenerate the figure outputs; it downloads the dataset on-demand and executes every figure script.