Add MedFlamingo Multimodal VQA Pipeline by zarmeen2 · Pull Request #954 · sunlabuiuc/PyHealth

zarmeen2 · 2026-04-07T23:29:55Z

Contributors:

Zarmeen Hasan, zarmeen2, zarmeen2@illinois.edu
Camdyn Zook, camdynz2, camdynz2@illinois.edu

Contribution Type: Full pipeline

Overview

This PR implements a full end-to-end Medical Visual Question Answering (VQA) pipeline based on the MedFlamingo architecture. It adds a new dataset loader, task definition, and model that integrate with PyHealth's existing Trainer, SampleDataset, and BaseModel conventions. The model also exposes a standalone generate() interface for free-text, few-shot VQA.

Files Changed

PyHealth/
├── docs/api/
│   ├── datasets/
│   │   └── pyhealth.datasets.VQARADDataset.rst  # new
│   ├── models/
│   │   └── pyhealth.models.MedFlamingo.rst       # new
│   ├── tasks/
│   │   └── pyhealth.tasks.MedicalVQATask.rst     # new
│   ├── datasets.rst                              # updated
│   ├── models.rst                                # updated
│   └── tasks.rst                                 # updated
├── examples/
│   └── vqarad_medvqa_medflamingo.py              # new: end-to-end example + ablation study
├── pyhealth/
│   ├── datasets/
│   │   ├── configs/
│   │   │   └── vqarad.yaml                       # new: schema config
│   │   ├── __init__.py                           # updated: exports VQARADDataset
│   │   └── vqarad.py                             # new: VQA-RAD dataset loader
│   ├── models/
│   │   ├── __init__.py                           # updated: exports MedFlamingo, MedFlamingoLayer
│   │   └── medflamingo.py                        # new: PerceiverResampler, MedFlamingoLayer, MedFlamingo
│   └── tasks/
│       ├── __init__.py                           # updated: exports MedicalVQATask
│       └── medical_vqa_task.py                   # new: MedicalVQATask
├── tests/core/
│   ├── test_medflamingo.py                       # updated: model unit tests with CPU stubs
│   ├── test_medical_vqa_task.py                  # new: isolated task unit tests
│   └── test_vqarad.py                            # new: dataset integration tests
├── pixi.lock                                     # updated: dependency lock file
└── test_medflamingo.py                           # new: root-level smoke test

New Components

VQARADDataset — Loads the VQA-RAD dataset from its raw JSON, normalizes field name variants, and writes a flat CSV for PyHealth's base dataset pipeline. Overrides set_task() to auto-inject ImageProcessor(mode="RGB", image_size=224).

MedicalVQATask — Converts patient VQA-RAD events into {image, question, answer} sample dicts. Declares input_schema: {image: "image", question: "text"} and output_schema: {answer: "multiclass"}, framing VQA as closed-set classification for the standard training loop.

MedFlamingo — Integrates a frozen CLIP ViT-L/14 vision encoder and a frozen LLM (default: OPT-6.7B) with trainable gated cross-attention layers (MedFlamingoLayer) inserted every N LLM layers. A PerceiverResampler compresses variable-length CLIP patch tokens to a fixed 64-token sequence before each cross-attention block. Only the cross-attention layers and classification head are trained.

Design Decisions

Zero-initialized gates: Both the attention gate and FFN gate in each MedFlamingoLayer are initialized to zero so the model starts training as the frozen LLM, preventing unstable early updates (from the original Flamingo paper).
Dual interface: forward() conforms to PyHealth's BaseModel contract for Trainer compatibility; generate() supports open-ended few-shot generation by passing visually-conditioned embeddings (inputs_embeds) directly to the LLM.
Testable by design: _init_vision_encoder() and _init_lang_model() are isolated methods overridden in TestableMedFlamingo to swap in CPU-only stubs, so tests run without any model downloads.

Tests

The test suite is split into three focused files. test_medflamingo.py tests the model in isolation using lightweight CPU stubs — covering forward(), generate() (single and few-shot), inputs_embeds usage verification, and gradient flow. test_medical_vqa_task.py tests MedicalVQATask schema, sample emission, and edge cases with dummy objects. test_vqarad.py runs VQARADDataset end-to-end against a synthetic fixture, validating CSV output, patient/event parsing, and processed sample shapes.

Adding MedFlamingo Model Scaffold

add MedFlamingo to models.rst

…g-suggestions

- Fix MedFlamingo.generate() to pass inputs_embeds so xattn visual conditioning is actually applied (was passing raw input_ids) - Fix MedFlamingo.__init__() to initialise self._fc = None when no dataset is supplied (prevents AttributeError in forward()) - VQARADDataset.prepare_metadata(): filter rows whose image file is missing from disk (14 OSF images never existed); logs a warning - Remove duplicate VQARADDataset import in datasets/__init__.py - Remove duplicate MedicalVQATask import in tasks/__init__.py - medical_vqa_task.py: add module docstring, full Google-style class docstring, and __call__ docstring with Args / Returns / Example - examples/vqarad_medvqa_medflamingo.py: full rewrite with three ablation axes (cross_attn_every_n_layers, num_resampler_tokens, freeze_vision), --ablation CLI flag, helper functions, usage docs - tests/core/test_medflamingo.py: remove all TODO stubs; add isolated MedicalVQATask unit tests and test_generate_uses_inputs_embeds; fix Patient construction to use Polars DataFrame API Contributors: Zarmeen Hasan (zarmeen2), Camdyn Zook (camdynz2)

Feat/medflamingo full pipeline

…testing-suggestions Feat/docs example repo clean up and testing suggestions

…g-suggestions

…testing-suggestions test fixes

zarmeen2 and others added 18 commits March 26, 2026 20:25

add model scaffold

09ca5af

add implementation

b297410

Merge pull request #1 from zDoda/feat/model-scaffold

aba7bde

Adding MedFlamingo Model Scaffold

Merge branch 'sunlabuiuc:master' into master

212ab6e

add MedFlamingo to models.rst

ca223b9

Merge branch 'sunlabuiuc:master' into master

6d2bcfe

Merge pull request #3 from zDoda/docs/add-medflamingo-to-models

5b3363b

add MedFlamingo to models.rst

still failing a test, but got a prototype

f59f266

fix path error

61d3def

fixed dataset loader to match PR standards

5db61af

Merge branch 'master' into feat/docs-example-repo-clean-up-and-testin…

b6f4083

…g-suggestions

lock file

2f7a379

Merge pull request #5 from zarmeen2/feat/medflamingo-full-pipeline

f954aae

Feat/medflamingo full pipeline

Merge pull request #4 from zDoda/feat/docs-example-repo-clean-up-and-…

492b49a

…testing-suggestions Feat/docs example repo clean up and testing suggestions

test fixes

ae39272

Merge branch 'master' into feat/docs-example-repo-clean-up-and-testin…

c12bfb5

…g-suggestions

Merge pull request #6 from zDoda/feat/docs-example-repo-clean-up-and-…

370b494

…testing-suggestions test fixes

zarmeen2 changed the title ~~Med-Flamingo Full Pipeline Implementation~~ Add MedFlamingo Multimodal VQA Pipeline Apr 9, 2026

zarmeen2 marked this pull request as ready for review April 9, 2026 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MedFlamingo Multimodal VQA Pipeline#954

Add MedFlamingo Multimodal VQA Pipeline#954
zarmeen2 wants to merge 18 commits intosunlabuiuc:masterfrom
zDoda:master

zarmeen2 commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zarmeen2 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Files Changed

New Components

Design Decisions

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zarmeen2 commented Apr 7, 2026 •

edited

Loading