Skip to content

fix: replace removed SampleBaseDataset in kg_emb#953

Open
capccode wants to merge 1 commit intosunlabuiuc:masterfrom
capccode:fix/kg-emb-sampledataset
Open

fix: replace removed SampleBaseDataset in kg_emb#953
capccode wants to merge 1 commit intosunlabuiuc:masterfrom
capccode:fix/kg-emb-sampledataset

Conversation

@capccode
Copy link
Copy Markdown

@capccode capccode commented Apr 7, 2026

Summary

Closes #952

SampleBaseDataset was deleted in the PyHealth 2.0 dataset refactor (commit 62df8d8), breaking all imports of pyhealth.medcode.pretrained_embeddings. Anyone importing the kg_emb module gets an ImportError.

  • SampleKGDataset now inherits from torch.utils.data.Dataset directly, inlining the three fields (samples, dataset_name, task_name) and __len__ that the deleted parent provided
  • Removed broken from pyhealth.datasets import SampleBaseDataset import and type annotations from 5 model files and the splitter
  • Added pandarallel to graph optional dependencies (required by umls.py)

Files changed (8):

  • kg_emb/datasets/sample_kg_dataset.py — new parent class + inlined fields
  • kg_emb/datasets/splitter.py — removed broken import
  • kg_emb/models/kg_base.py, complex.py, distmult.py, rotate.py, transe.py — removed broken import
  • pyproject.toml — added pandarallel to [project.optional-dependencies] graph

Test plan

  • from pyhealth.medcode.pretrained_embeddings import * no longer crashes
  • All 5 models (TransE, RotatE, DistMult, ComplEx, KGEBaseModel) import and instantiate
  • SampleKGDataset works with torch.utils.data.DataLoader
  • kg_emb.datasets.splitter.split() works
  • lm_emb is unaffected (no SampleBaseDataset references)

…was deleted in PyHealth 2 refactor breaking all imports from pretrained_embeddings, add pandarallel to graph optional deps
Copy link
Copy Markdown
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, there's an entirely separate SampleKGDataset here. Can you work with @joshuasteier later after our deadliens here to revamp it towards the graph/ module here.

The problem is we still want it to all follow through the PyHealth pipeline here, i.e like we do with GraphCare's implementation as technically Graphs are not disjoint from any other modality here. We just have a bunch of old technical debt that's stuck here.

Will hopefully get to your GRASP PR at some point.

@jhnwu3 jhnwu3 requested a review from joshuasteier April 8, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: kg_emb broken import in PyHealth 2.0

2 participants