feat: scaffolding, caching, EGFR by tristan-f-r · Pull Request #65 · Reed-CompBio/spras-benchmarking

tristan-f-r · 2026-03-18T03:20:16Z

We bundle EGFR along with the rest of the caching infrastructure. Notes:

All motivation for the caching system lives under cache/README.md.
We removed pra.yaml for now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.
The CONTRIBUTING.md file is not finalized, and is simply there to not break Changes to CONTRIBUTING guide #57. I may split all contributing material into Changes to CONTRIBUTING guide #57 later.
directory.py contains unnecessary files from other datasets that were deemed universal.
I would like to keep the web folder even though I'm aware no one is currently in a position to review it.

not needed just yet

web/public/favicon.svg

ntalluri

I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.

configs/scores.yaml

tools/README.md

.vscode/extensions.json

.devcontainer/devcontainer.json

cache/biomart/README.md

README.md

configs/dmmm.yaml

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

cache/README.md

this is only in github actions

ntalluri · 2026-03-25T01:13:16Z

datasets/egfr/README.md

+
+The score data (`egfr-prizes.txt`), gold standard nodes `eight-egfr-reference-all.txt`, and the (now-deprecated) manually edited `iRefIndex`-based interactome are all from [_Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data_](https://doi.org/10.1016/j.celrep.2018.08.085).
+
+We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.


Suggested change

We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.

We also use the StringDB v12 human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.

what is the date of the UniProt mapping file (when was it downloaded)? Please add that date to this as well.

Can you also add that identifiers we are mapping from and mapping to for each of the data.

That for the iRefIndex/PhosphoSite we use Uniprot IDs for all the data and for String we use ENSP.

datasets/egfr/README.md

cache/README.md

configs/scores.yaml

.vscode/extensions.json

__init__.py

ntalluri · 2026-03-25T16:05:56Z

CONTRIBUTING.md

can this be added to #57?

README.md

ntalluri · 2026-03-25T16:07:57Z

README.md

+```
+
+`spras` is the cloned submodule of [SPRAS](https://github.com/reed-compbio/spras),
+`configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient


Suggested change

`configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient

`configs` is the YAML file used to set up workflows in SPRAS, and `datasets` contains the raw and processed data. `cache` is utility for `datasets` which provides a convenient

README.md

ntalluri

Updated Review

ntalluri · 2026-03-25T17:55:15Z

datasets/egfr/README.md

are we doing the trimming in this PR?

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

to move to web

chore: drop other datasets

b49439e

tristan-f-r added the enhancement New feature or request label Mar 18, 2026

tristan-f-r added 2 commits March 17, 2026 20:36

Merge branch 'main' into egfr-and-infrastructure

2018a13

chore: re-include

136e5ff

tristan-f-r mentioned this pull request Mar 18, 2026

Changes to CONTRIBUTING guide #57

Draft

tristan-f-r added 2 commits March 17, 2026 20:42

chore: drop tools

472468d

not needed just yet

chore: re-add tools

a5de971

This was referenced Mar 18, 2026

dataset: DISEASES #66

Open

dataset: yeast osmotic stress #67

Open

dataset: hiv #68

Open

dataset: muscle skeletal (from ResponseNet) #69

Open

dataset: DepMap #70

Open

tristan-f-r added the dataset Mutating datasets in any way. label Mar 18, 2026

tristan-f-r mentioned this pull request Mar 18, 2026

dataset: synthetic from PANTHER #71

Draft

tristan-f-r added 3 commits March 18, 2026 05:53

docs: cache

8ddccb4

style: fmt

90cc277

docs: on caching

eb23b8f

tristan-f-r mentioned this pull request Mar 18, 2026

chore: delete [temporarily!] #64

Merged

tristan-f-r changed the title ~~feat: initial scaffolding, EGFR~~ feat: scaffolding, caching, EGFR Mar 18, 2026

ntalluri reviewed Mar 18, 2026

View reviewed changes

web/public/favicon.svg Outdated Show resolved Hide resolved

ntalluri reviewed Mar 18, 2026

View reviewed changes

tristan-f-r and others added 2 commits March 18, 2026 16:54

docs: suggestions from review

4b524bc

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

docs: more comments, refactor: mv function out of Snakefile

69fda05

ntalluri reviewed Mar 19, 2026

View reviewed changes

cache/README.md Outdated Show resolved Hide resolved

ntalluri reviewed Mar 19, 2026

View reviewed changes

cache/README.md Outdated Show resolved Hide resolved

tristan-f-r added 4 commits March 19, 2026 18:58

docs(datasets): mention responsenet and egfr

15c7ecb

docs(datasets): add old synthetic data branch

729a51b

chore: mv to scores instead of dmmm

922be5d

docs: drop expiration docs

f3d6d41

this is only in github actions

tristan-f-r mentioned this pull request Mar 23, 2026

Hook into loguru to warn for outdated datasets #72

Open

tristan-f-r added 4 commits March 23, 2026 22:08

docs: clarify snakemake importing

a802fde

docs: more clarification on data and files

85ba6e9

refactor: move irefindex back to egfr

8139f14

chore: revert web

5a8c02e

tristan-f-r mentioned this pull request Mar 24, 2026

feat: web #73

Open

tristan-f-r requested a review from ntalluri March 24, 2026 23:24

docs: clarify on cache

041f998

ntalluri reviewed Mar 25, 2026

View reviewed changes

datasets/egfr/README.md

Copy link

Collaborator

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we doing the trimming in this PR?

tristan-f-r and others added 6 commits March 25, 2026 12:05

chore: apply suggestions

bd3bcab

Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>

docs: apply suggestions

c08a0b8

chore: apply suggestions

b3cf691

drop extensions

193f75a

to move to web

refactor(egfr): correctly specify output dirs

0581bea

feat(egfr): add target nodes, fmt

320aa87


		The score data (`egfr-prizes.txt`), gold standard nodes `eight-egfr-reference-all.txt`, and the (now-deprecated) manually edited `iRefIndex`-based interactome are all from [_Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data_](https://doi.org/10.1016/j.celrep.2018.08.085).

		We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.

	We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.
	We also use the StringDB v12 human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.

	`configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient
	`configs` is the YAML file used to set up workflows in SPRAS, and `datasets` contains the raw and processed data. `cache` is utility for `datasets` which provides a convenient

Conversation

tristan-f-r commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ntalluri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ntalluri left a comment

Choose a reason for hiding this comment

Uh oh!

ntalluri Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tristan-f-r commented Mar 18, 2026 •

edited

Loading