XSPATIO is an explanatory computational pathology pipeline that directly links H&E morphology with spatially resolved molecular expression using ROI-aware segmentation, foundation-model feature extraction, and spatially constrained multiple-instance learning (MIL).
The framework is designed for datasets with region-level molecular ground truth, such as NanoString GeoMx DSP, enabling biologically grounded prediction and interpretation of gene and protein expression from routine histology.
XSPATIO consists of five modular stages:
- ROI-aware segmentation (XSPATIO-SEG)
- Patch-level feature extraction (XSPATIO-FEAT)
- Spatially constrained MIL modeling (XSPATIO-MIL)
- Model evaluation and cross-validation
- Attention-based spatial visualization (XSPATIO-Heatmaps)
ORIGINAL_ROI_FOLDER/
├── ROI_1.jpg
├── ROI_2.jpg
└── ...Once we have the ROIs, we proceed with segmenting regions of interest using dsp coordinates available in the presets folder under XSPATIO-SEG.
python3 create_patches_fp.py \
--source /home/ubuntu/CLAM/ORIGINAL_ROI_FOLDER \
--save_dir SEG_patches \
--patch_size 256 \
--preset dsp.csv \
--seg --patch --stitch \
>tma_segmentation.log 2>&1 &Once we have the segmented patches, we proceed with extracting features using the UNI model.
python3 extract_features_fp.py \
--data_h5_dir SEG_patches \
--data_slide_dir ORIGINAL_ROI_FOLDER \
--csv_path SEG_patches/process_list_autogen.csv \
--model_name uni_v1 \
--feat_dir tma_extracted_features \
--batch_size 512 \
--slide_ext .jpg \
>tma_feature_extraction.log 2>&1 &tma_extracted_features/ ├── h5_files/ ├── pt_files/ └── feature.txt
Gene or protein expression values are binarized into low- and high-expression groups using their respective thresholds across ROIs.
python3 gene_file_creater_clam.py \
--csv_file combined_expression_tma1+2_gene_selected.csv \
--column_name BIOMARKER_NAME \
--h5_source_folder tma_extracted_features/h5_files \
--pt_source_folder tma_extracted_features/pt_files \
--low_h5_folder GENE_EXP/low_expression_genes/h5_files \
--low_pt_folder GENE_EXP/low_expression_genes/pt_files \
--high_h5_folder GENE_EXP/high_expression_genes/h5_files \
--high_pt_folder GENE_EXP/high_expression_genes/pt_filesWe create CSV files with patient_id, ROI_id, and labels before training using the script below.
python3 clam_csv_generator.pyMake necessary changes in the script before running it.
low_expression_genes_folder = "GENE_EXP_CLAM/low_expression_genes"
high_expression_genes_folder = "GENE_EXP_CLAM/high_expression_genes"
output_csv_path = "GENE_EXP_FILE.csv"The GENE_EXP_FILE.csv file should look like:
patient_id,slide_id,label
patient_0,slide_1,low_expression_genes
patient_1,slide_2,high_expression_genesNext, we define the task in create_split_seq.py, main.py, and eval.py.
We create splits using the following command:
python3 create_splits_seq.py --task gene_exp --seed 1 --k 12--task is the task we just defined, gene_exp
--k is the number of folds, to make splits in the dataset. 10-fold follow distribution of (80/10/10) splits for the dataset. Ours was k=12 or (76/12/12).
To run the training without interruptions run the command: tmux new -s “session”
For training, run the following command:
CUDA_VISIBLE_DEVICES=0 python3 main.py \
--drop_out 0.25 \
--early_stopping \
--lr 2e-4 \
--k 12 \
--B 3
--split_dir gene_exp \
--exp_code gene_exp_100 \
--weighted_sample \
--bag_loss ce \
--task gene_exp \
--model_type clam_sb \
--log_data \
--data_root_dir GENE_EXP \
--embed_dim 1024 \
>train.txt 2>&1 &To evaluate the script execute the command below:
CUDA_VISIBLE_DEVICES=0 python3 eval.py \
--k 12 \
--models_exp_code SAVED_MODEL_RESULTS \
--save_exp_code EVAL_MODEL_RESULTS \
--task gene_exp \
--model_type clam_sb \
--results_dir results \
--data_root_dir GENE_EXP \
--embed_dim 1024 \
>eval.txt 2>&1 &
Navigate to Heatmap/heatmaps/configs/config_template_dsp.yaml and customize the configuration template by specifying the biomarker name, corresponding trained model checkpoint, and desired output directories for heatmap generation.
CUDA_VISIBLE_DEVICES=0 python3 create_heatmaps_dsp.py \
--config config_template_dsp.yaml