Predict death after age 60 using patient features and disease history before age 60, evaluated across four methods.
| # | Method | Description |
|---|---|---|
| 1 | Delphi | Generative transformer for health trajectories; predicts "Death" token probability. Evaluated with DeLong AUC. |
| 2 | Benchmarking (CoxPH) | CoxPH survival model on binary disease features + baseline biomarkers. Evaluated with C-index and time-dependent AUC. |
| 3 | Text Embedding + CoxPH | Convert disease history to natural language, embed with Qwen3-Embedding, combine with baselines, fit CoxPH. |
| 4 | Trajectory Embedding + CoxPH | Delphi-style token + age embeddings (sin/cos), pool across events, combine with baselines, fit CoxPH. |
├── benchmarking/ # Survival data preprocessing & CoxPH training
│ ├── preprocess_diagnosis.py # Extract disease features from UKB
│ ├── preprocess_survival.py # Build survival dataset (event_flag, duration_days)
│ ├── autoprognosis_survival_dataset.csv # Output: survival dataset
│ └── disease_before60_features.csv # Output: binary disease flags
├── Delphi/ # Delphi model, training & evaluation code
│ ├── model.py, train.py, utils.py # Core Delphi code
│ └── evaluate_auc.py # AUC evaluation via DeLong
├── embedding/ # Embedding extraction (methods 3 & 4)
│ ├── qwen_embedding.py # Qwen text-only embedding (method 3 tokens / method 4 texts)
│ └── trajectory_embedding.py # Token+age embedding pipeline (method 4)
├── preprocessing/ # Preprocessing for embedding inputs
│ ├── generate_disease_trajectory.py # Build age-at-diagnosis matrix (disease_trajectory.csv)
│ ├── generate_trajectory_text.py # Convert matrix → Delphi-style trajectory text per patient
│ └── natural_text_conversion.py # Convert tabular data → natural-language text per patient
├── evaluation/ # Unified evaluation & comparison
│ ├── cohort_split.py # Define shared train/val/test split
│ ├── evaluate_delphi.py # Evaluate Delphi on shared cohort
│ ├── evaluate_benchmarking.py # Train & evaluate CoxPH on shared cohort
│ ├── evaluate_embedding_survival.py # Train & evaluate CoxPH on embeddings
│ └── unified_evaluation.py # Compare all methods in one table
├── data/ # Raw & processed data (gitignored)
├── UKB_extraction/ # UK Biobank data extraction tools
├── docs/ # Proposals, references
└── run_pipeline.sh # One-command pipeline runner (steps 1–7)
Run the entire pipeline end-to-end with run_pipeline.sh:
# Local / CPU: 10k sample, Qwen3-Embedding-0.6B (auto-selected)
bash run_pipeline.sh
# Full dataset, auto-selects model based on device
bash run_pipeline.sh --full
# GPU server: full dataset, 8B model
bash run_pipeline.sh --full --embedding-model Qwen/Qwen3-Embedding-8B
# Mid-range GPU: 4B model
bash run_pipeline.sh --full --embedding-model Qwen/Qwen3-Embedding-4B
# Skip preprocessing if data already exists
bash run_pipeline.sh --skip-preprocess --steps 5,6,7
# Skip Delphi (if no checkpoint available)
bash run_pipeline.sh --skip-delphiOptions:
| Flag | Description |
|---|---|
--full |
Use all participants instead of a 10k sample |
--sample-size N |
Custom sample size (default: 10000) |
--embedding-model MODEL |
Qwen3-Embedding-0.6B/4B/8B (auto-selected by device) |
--token-mode random|qwen |
Trajectory token embedding mode (default: random) |
--skip-preprocess |
Skip steps 1-2 if CSV files already exist |
--skip-delphi |
Skip Delphi evaluation |
--steps 1,2,3,... |
Run only specific steps |
--device cuda|cpu |
Force device (auto-detected by default) |
--random-state N |
Random seed (default: 42) |
The script logs everything to pipeline_YYYYMMDD_HHMMSS.log and prints the comparison table at the end.
Embedding model load error: If you see Can't load the configuration of 'Qwen/Qwen3-Embedding-0.6B', the code will retry with a fresh download and then fall back to sentence-transformers if needed. Install the fallback with: pip install sentence-transformers. Ensure no local folder named Qwen exists in the current working directory, and that you have network access to Hugging Face.
Extract raw UK Biobank data into data/. See UKB_extraction/ for tooling.
These scripts produce the two CSV files that all downstream steps depend on.
python benchmarking/preprocess_diagnosis.py # → disease_before60_features.csv
python benchmarking/preprocess_survival.py # → autoprognosis_survival_dataset.csv (10k sample)
# Or use the full dataset:
python benchmarking/preprocess_survival.py --allGenerate per-patient age-at-diagnosis for all diseases (needed by method 4).
python preprocessing/generate_disease_trajectory.py # → data/preprocessed/disease_trajectory.csvCreate a single train/val/test split (70/15/15, stratified) used by all methods.
python evaluation/cohort_split.py # → evaluation/cohort_split.jsonMethod 3 (text embedding): convert trajectory data to disease-history text with age at diagnosis (e.g. "At age 20.3, patient was diagnosed with G43 migraine."). Only disease events are included; demographics and biomarkers are added as numeric features during survival model training.
python preprocessing/natural_text_conversion.py \
--trajectory-csv data/preprocessed/disease_trajectory.csv \
--output-csv data/preprocessed/text_before60.csv \
--output-dir data/preprocessed/text_before60Method 4 (trajectory embedding): convert trajectory matrix to Delphi-style text.
python preprocessing/generate_trajectory_text.py \
--output-csv data/preprocessed/trajectory_before60.csv \
--output-dir data/preprocessed/trajectory_before60Method 1 (Delphi binary data): convert trajectory + demographics to Delphi binary format, aligned with the shared cohort split.
python Delphi/preprocess_delphi_binary.py \
--output-dir Delphi/data/ukb_respiratory_dataThis generates train.bin, val.bin, test.bin using the same patient splits as all other methods.
Method 3: embed natural-language texts with Qwen3-Embedding.
# GPU server (8B, 4096-dim):
python embedding/qwen_embedding.py \
--input-csv data/preprocessed/text_before60.csv \
--output-dir data/preprocessed/embeddings_text \
--model-name Qwen/Qwen3-Embedding-8B
# Local / CPU (0.6B, 1024-dim):
python embedding/qwen_embedding.py \
--input-csv data/preprocessed/text_before60.csv \
--output-dir data/preprocessed/embeddings_text \
--model-name Qwen/Qwen3-Embedding-0.6B \
--no-flash-attnMethod 4: embed trajectory token+age vectors.
# Random token embeddings (CPU, for testing):
python embedding/trajectory_embedding.py \
--input-csv data/preprocessed/trajectory_before60.csv \
--output-dir data/preprocessed/embeddings_traj
# Or with Qwen token embeddings (GPU):
python embedding/trajectory_embedding.py \
--input-csv data/preprocessed/trajectory_before60.csv \
--output-dir data/preprocessed/embeddings_traj \
--token-mode qwenEach evaluation script trains on the shared train split and evaluates on val/test.
# Method 1: Delphi (requires step 4 Delphi binary data)
python evaluation/evaluate_delphi.py \
--split test \
--save-preds \
--horizons-days 365 1825
# Method 2: Benchmarking (CoxPH on binary disease features)
python evaluation/evaluate_benchmarking.py \
--baseline-mode all
# Method 3: Text Embedding + CoxPH
python evaluation/evaluate_embedding_survival.py \
--embedding-dir data/preprocessed/embeddings_text \
--tag patient \
--method-name text_embedding \
--baseline-mode all
# Method 4: Trajectory Embedding + CoxPH
python evaluation/evaluate_embedding_survival.py \
--embedding-dir data/preprocessed/embeddings_traj \
--tag trajectory \
--method-name trajectory_embedding \
--baseline-mode noneEach evaluator now shares the same CLI options:
--baseline-mode {all, none, custom}(and--baseline-cols ...) to control which survival covariates are concatenated with embeddings.--survival-csv/--cohort-jsonto point at alternative datasets or splits.--save-predsto dump per-split risk scores inevaluation/*/predictions/.- Delphi adds
--horizons-daysto override the default quantile-based horizons and aligns its risk outputs with the shared survival metrics.
python evaluation/unified_evaluation.py # → evaluation/unified_comparison.csvPython 3.9+. Install all dependencies at once:
pip install -r requirements.txtFor GPU (CUDA), install PyTorch with CUDA first:
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtKey dependencies and what uses them:
| Package | Version | Used by |
|---|---|---|
torch |
>=2.4.0 | Delphi, Qwen3-Embedding |
transformers |
>=4.51.0 | Qwen3-Embedding (methods 3 & 4) |
lifelines |
>=0.27.0 | CoxPH models (methods 2, 3, 4) |
scikit-survival |
>=0.22.0 | Time-dependent AUC evaluation |
numpy |
>=1.24.0 | All components |
pandas |
>=2.0.0 | All components |
See also: Delphi/requirements.txt (original Delphi deps), embedding/requirements_qwen.txt (Qwen-specific).
- traj_before60: add hints for LLM to understand the text
- text_before60: add timepoints for alcohol and smoking