Add ClinicalJargonDataset and ClinicalJargonVerification benchmark task by John-Carson · Pull Request #941 · sunlabuiuc/PyHealth

John-Carson · 2026-04-04T19:34:09Z

PyHealth PR Description

Summary

Adds ClinicalJargonDataset backed by public MedLingo and CASI benchmark assets.
Adds ClinicalJargonVerification, a binary candidate-verification task for public clinical jargon evaluation.
Adds docs, example usage, synthetic test resources, and unit tests.

Contributors

John Carson
johnwc4@illinois.edu

Contribution Type

Dataset + Task

Original Paper

Furong Jia, David Sontag, and Monica Agrawal. "How does my language model understand clinical text?" CHIL 2025.
https://arxiv.org/abs/2505.15024

Implementation Overview

ClinicalJargonDataset downloads and normalizes the public MedLingo and CASI assets into a PyHealth dataset.
ClinicalJargonVerification converts each benchmark item into paired-text binary classification samples over candidate expansions.
The example script demonstrates task configuration ablations through benchmark, casi_variant, and medlingo_distractors.
The tests use only synthetic/demo resources and validate dataset loading, patient parsing, task generation, and sample structure.

Files To Review

pyhealth/datasets/clinical_jargon.py
pyhealth/datasets/configs/clinical_jargon.yaml
pyhealth/tasks/clinical_jargon_verification.py
examples/clinical_jargon_clinical_jargon_verification_transformers.py
tests/core/test_clinical_jargon.py
docs/api/datasets/pyhealth.datasets.ClinicalJargonDataset.rst
docs/api/tasks/pyhealth.tasks.ClinicalJargonVerification.rst

Validation

python3 -m unittest discover -s 598-DLH/clinical_jargon_project/tests -p 'test_*.py'
PYTHONPATH=598-DLH/PyHealth python3 -m unittest 598-DLH/PyHealth/tests/core/test_clinical_jargon.py
python3 598-DLH/PyHealth/examples/clinical_jargon_clinical_jargon_verification_transformers.py --model-name hf-internal-testing/tiny-random-bert --benchmark medlingo --medlingo-distractors 1 --epochs 1 --batch-size 2

Copilot

Pull request overview

Adds a new public clinical jargon benchmark dataset and an associated binary verification task, plus supporting docs, example usage, and unit tests.

Changes:

Introduces ClinicalJargonDataset with normalized MedLingo + CASI metadata and a YAML dataset config.
Adds ClinicalJargonVerification task that generates paired-text binary samples over candidate expansions.
Adds a runnable Transformers example, Sphinx API docs, synthetic test resources, and unit tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`pyhealth/datasets/clinical_jargon.py`	Implements dataset normalization and (currently automatic) remote asset fetching.
`pyhealth/datasets/configs/clinical_jargon.yaml`	Declares the `examples` table schema for the dataset.
`pyhealth/datasets/__init__.py`	Exposes `ClinicalJargonDataset` at package import level.
`pyhealth/tasks/clinical_jargon_verification.py`	Implements the candidate-verification task and sample generation.
`pyhealth/tasks/__init__.py`	Exposes `ClinicalJargonVerification` at package import level.
`examples/clinical_jargon_clinical_jargon_verification_transformers.py`	Demonstrates training/evaluating a Transformers model on the task.
`tests/core/test_clinical_jargon.py`	Adds unit tests covering dataset/task loading and sample structure.
`test-resources/clinical_jargon/clinical_jargon_examples.csv`	Adds synthetic/demo benchmark rows used by tests/examples.
`docs/api/datasets/pyhealth.datasets.ClinicalJargonDataset.rst`	Adds Sphinx API stub for the dataset.
`docs/api/datasets.rst`	Adds dataset entry to the datasets API index.
`docs/api/tasks/pyhealth.tasks.ClinicalJargonVerification.rst`	Adds Sphinx API stub for the task.
`docs/api/tasks.rst`	Adds task entry to the tasks API index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pyhealth/datasets/clinical_jargon.py

examples/clinical_jargon_clinical_jargon_verification_transformers.py

Add clinical jargon benchmark dataset and verification task

30756f7

Copilot AI review requested due to automatic review settings April 4, 2026 19:34

Copilot started reviewing on behalf of John-Carson April 4, 2026 19:34 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

John-Carson marked this pull request as draft April 4, 2026 19:50

John-Carson added 3 commits April 7, 2026 20:02

Polish clinical jargon docs and example guidance

fe983a3

Address Copilot review feedback for clinical jargon

59ffe38

Prevent split leakage in clinical jargon example

f9540df

John-Carson marked this pull request as ready for review April 9, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ClinicalJargonDataset and ClinicalJargonVerification benchmark task#941

Add ClinicalJargonDataset and ClinicalJargonVerification benchmark task#941
John-Carson wants to merge 4 commits intosunlabuiuc:masterfrom
John-Carson:cs598-clinical-jargon

John-Carson commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

John-Carson commented Apr 4, 2026

PyHealth PR Description

Summary

Contributors

Contribution Type

Original Paper

Implementation Overview

Files To Review

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants