Skip to content

Add PTB-XL dataset and MI classification task#950

Draft
zaidalkhatib wants to merge 2 commits intosunlabuiuc:masterfrom
zaidalkhatib:ptbxl-mi-task
Draft

Add PTB-XL dataset and MI classification task#950
zaidalkhatib wants to merge 2 commits intosunlabuiuc:masterfrom
zaidalkhatib:ptbxl-mi-task

Conversation

@zaidalkhatib
Copy link
Copy Markdown

@zaidalkhatib zaidalkhatib commented Apr 6, 2026

Contributors: Zaid Alkhatib (zaida3@illinois.edu), Anila Narapusetty (anilan2@illinois.edu)

Contribution Type: Dataset + Task

Paper: Data Augmentation for Electrocardiograms
Paper Link: https://arxiv.org/abs/2204.04360

Overview

This PR adds a PTB-XL dataset integration and a binary MI classification task as a partial reproduction of the paper Data Augmentation for Electrocardiograms.

The contribution focuses on the dataset + task portion of the pipeline in PyHealth. This implementation focuses on the dataset and task portion of the paper rather than reproducing the full model and training pipeline.

What was implemented

Dataset

  • Added PTBXLDataset in pyhealth/datasets/ptbxl.py
  • Uses BaseDataset
  • Loads PTB-XL metadata from ptbxl_database.csv
  • Converts ECG records into PyHealth event format
  • Supports dev=True for fast iteration

Task

  • Added PTBXLMIClassificationTask in pyhealth/tasks/ptbxl_mi_classification.py
  • Binary classification task:
    • 1 = MI
    • 0 = non-MI
  • Extracts labels from the scp_codes field
  • Uses PyHealth task schemas for timeseries input and binary labels

Tests

  • Added synthetic unit tests:
    • tests/core/test_ptbxl_dataset.py
    • tests/core/test_ptbxl_mi_classification.py
  • Tests run quickly and do not require external downloads

Docs

  • Added dataset API doc:
    • docs/api/datasets/pyhealth.datasets.ptbxl.rst
  • Added task API doc:
    • docs/api/tasks/pyhealth.tasks.ptbxl_mi_classification.rst
  • Updated:
    • docs/api/datasets.rst
    • docs/api/tasks.rst

Example

  • Added:
    • examples/ptbxl_mi_classification_cnn.py

Files to Review

Core implementation

  • pyhealth/datasets/ptbxl.py
  • pyhealth/tasks/ptbxl_mi_classification.py

Registration

  • pyhealth/datasets/__init__.py
  • pyhealth/tasks/__init__.py

Tests

  • tests/core/test_ptbxl_dataset.py
  • tests/core/test_ptbxl_mi_classification.py

Documentation

  • docs/api/datasets/pyhealth.datasets.ptbxl.rst
  • docs/api/tasks/pyhealth.tasks.ptbxl_mi_classification.rst
  • docs/api/datasets.rst
  • docs/api/tasks.rst

Example

  • examples/ptbxl_mi_classification_cnn.py

Notes

  • Tests use synthetic data only
  • Example requires a local PTB-XL download to run end-to-end
  • This is a dataset + task reproduction-oriented contribution based on the paper above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant