Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining. (Nature 2026)
To install Merlin, you can simply run:
pip install merlin-vlmFor an editable installation, use the following commands to clone and install this repository.
conda create --name merlin python==3.10
conda activate merlin
git clone https://github.com/StanfordMIMI/Merlin.git
cd Merlin
pip install -e .
# Alternatively, to install exact package versions as tested:
# uv syncTo create a Merlin model with both image and text embeddings enabled, use the following:
from merlin import Merlin
model = Merlin()To initialize the model with only image embeddings active, use:
from merlin import Merlin
model = Merlin(ImageEmbedding=True)To initialize the model for phenotype classification, use:
from merlin import Merlin
model = Merlin(PhenotypeCls=True)To initialize the model for five-year disease prediction, use:
from merlin import Merlin
model = Merlin(FiveYearPred=True)To initialize the model for radiology report generation, use:
from merlin import Merlin
model = Merlin(RadiologyReport=True)For inference on a demo CT scan, please check out the general demo and report generation demo.
For additional information, please read the inference documentation and report generation documentation.
For segmentation, we integrated Merlin with nnU-Net framework. Please refer to the Merlin segmentation repository and its README for detailed setup and inference instructions.
We are excited to release the Merlin Abdominal CT Dataset to the community!
For details on accessing and using the dataset, please see the download documentation!
If you find this repository useful for your work, please cite the cite the Nature paper:
@article{blankemeier_kumar2026merlin,
author = {Blankemeier, Louis and Kumar, Ashwin and Cohen, Joseph Paul and Liu, Jiaming and Liu, Longchao and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Yu, Hongkun and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Holland, Robbie and Truyts, Cesar and Bluethgen, Christian and Wu, Yufu and Lian, Long and Jensen, Malte Engmann Kjeldskov and Ostmeier, Sophie and Varma, Maya and Valanarasu, Jeya Maria Jose and Fang, Zhongnan and Huo, Zepeng and Nabulsi, Zaid and Ardila, Diego and Weng, Wei-Hung and Amaro Junior, Edson and Ahuja, Neera and Fries, Jason and Shah, Nigam H. and Zaharchuk, Greg and Willis, Marc and Yala, Adam and Johnston, Andrew and Boutin, Robert D. and Wentland, Andrew and Langlotz, Curtis P. and Hom, Jason and Gatidis, Sergios and Chaudhari, Akshay S.},
title = {Merlin: a computed tomography vision-language foundation model and dataset},
journal = {Nature},
year = {2026},
doi = {10.1038/s41586-026-10181-8},
url = {https://doi.org/10.1038/s41586-026-10181-8}
}