XMR (Cross-Population Mendelian Randomization) is a probabilistic method for estimating causal effects between an exposure and an outcome using genome-wide summary statistics from multiple populations.
XMR improves the power and robustness of causal inference in underrepresented (small-sample) populations by leveraging information from a large-sample auxiliary population. Specifically, XMR decomposes the observed SNP–trait effects into true causal effects and confounding factors (e.g., pleiotropy, population structure) hidden in summary statistics. By explicitly modelling the genetic correlation between two populations, XMR effectively borrows strength from the large-sample group. XMR further corrects bias introduced by IV selection and LD clumping to reduce false positive rates.
Overview of the XMR method. XMR estimates the causal effect
# install.packages("devtools")
devtools::install_github("YangLabHKUST/XMR")We illustrate how to perform cross-population MR analysis using XMR with a real-data example: LDL cholesterol (LDLC, exposure) and myocardial infarction (MI, outcome), with Europeans (EUR) as the auxiliary population and East Asians (EAS) as the target population.
The XMR analysis comprises two main steps:
- Step 1: Prepare data and estimate background parameters (the C matrix and Omega matrix via cross-population LD score regression).
- Step 2: Fit XMR for causal inference.
For a step-by-step walkthrough, see the XMR tutorial: causal effect of LDLC on MI (download link).
For a quick start, you can skip Step 1 and proceed directly to Step 2 using the example data we have prepared.
library(XMR)
exposure <- "LDLC"
outcome <- "MI"
# Sample sizes
N1 <- 343621 # EUR (auxiliary population)
N2 <- 72866 # EAS (target population)
# Modified IV selection threshold for correction of selection bias
threshold <- 5e-05 # IV selection threshold
t0 <- abs(qnorm(threshold / 2))
dt <- 0.13 / (sqrt(N2 / N1))
modified_threshold <- 2 * (1 - pnorm(abs(t0 + dt)))
# Load example data
data(C)
data(Omega)
data(clumped_data) # after IV selection and LD clumping
# Fit XMR
XMR_res <- fit_XMR(
data = clumped_data,
C = C,
Omega0 = Omega,
Threshold = modified_threshold,
tol1 = 1e-07,
tol2 = 1e-07,
min_thres = 1e-2
)The input data should be a data.frame containing the following columns:
| Column | Description |
|---|---|
b.exp.pop1 |
SNP–exposure effect in the auxiliary population (pop1) |
b.exp.pop2 |
SNP–exposure effect in the target population (pop2) |
b.out.pop2 |
SNP–outcome effect in the target population (pop2) |
se.exp.pop1 |
Standard error of b.exp.pop1 |
se.exp.pop2 |
Standard error of b.exp.pop2 |
se.out.pop2 |
Standard error of b.out.pop2 |
L2.pop1 |
LD score in the auxiliary population; defaults to all ones if not provided (i.e., no LD correction) |
L2.pop2 |
LD score in the target population; defaults to all ones if not provided (i.e., no LD correction) |
L12 |
Cross-population LD score between pop1 and pop2; defaults to all ones if not provided (i.e., no LD correction) |
- C matrix: A 3×3 matrix capturing the effects of sample structure (population stratification, cryptic relatedness, sample overlap, etc.).
- Omega matrix: A 3×3 variance–covariance matrix of polygenic effects.
Both can be estimated using bivariate LD score regression.
We applied XMR and 15 existing summary-level MR methods across three key domains:
- Simulations: evaluating method performance under various scenarios.
- Negative-control studies: testing the causal effects of 35 traits on 2 negative-control outcomes (skin tanning ability, natural hair color) in Africans (AFR) and Central/South Asians (CSA).
- Real-data analysis: inferring causal relationships in 3 underrepresented populations — East Asians (EAS), Central/South Asians (CSA), and Africans (AFR).
Source code and data for reproducing all results are available at YangLabHKUST/XMR_reproduce. The XMR execution scripts provided below feature a parallelized framework designed to efficiently analyze multiple trait pairs simultaneously.
Simulations:
Negative-control studies:
Format data | XMR in AFR | XMR in CSA | Other methods in AFR | Other methods in CSA | Visualization
Real-data analysis for EAS:
Format data | XMR in BBJ | XMR in TPMI | Other methods in BBJ | Other methods in TPMI | Visualization
Real-data analysis for CSA: Coming soon
Real-data analysis for AFR: Coming soon
All data and results needed to reproduce the above experiments are publicly available. See Step 2 for download links. Follow below steps for reproduction:
git clone https://github.com/YangLabHKUST/XMR_reproduce.git
cd XMR_reproduceXMR_reproduce/
├── nc/ # Negative-control analysis in AFR & CSA
├── real_data_CSA_AFR/ # Real data analysis in CSA & AFR (coming soon)
├── real_data_EAS/ # Real data analysis in EAS
└── sim/ # Simulations
We provide archived files containing formatted data, LD score files, analysis results, and other files needed for reproduction.
Raw GWAS summary statistics are not included due to their large size (~8–10 GB each).
Data sources are listed in the following tables — download the raw files, find the target folder in the above directory, place them in the corresponding raw_data/ folder, and run format_data.ipynb in the target folder to format:
| Experiment | Data source table |
|---|---|
| Negative-control studies | nc_data_source.csv |
| Real-data analysis (EAS) | real_data_EAS_data_source.csv |
| Real-data analysis (AFR) | real_data_AFR_data_source.csv |
| Real-data analysis (CSA) | real_data_CSA_data_source.csv |
Alternatively, you can skip the raw data step and start directly from our pre-formatted data by downloading the archives below. Then place them in the repository root and extract:
| File | Size | Link |
|---|---|---|
sim_data.tar.gz |
~28 MB | |
nc_data.tar.gz |
~6.8 GB | |
real_data_EAS.tar.gz |
~5.8 GB | |
real_data_CSA_AFR.tar.gz |
~X GB | Coming soon |
tar xzvf sim_data.tar.gz
tar xzvf nc_data.tar.gz
tar xzvf real_data_EAS.tar.gz
tar xzvf real_data_CSA_AFR.tar.gzEach archive preserves the directory structure and will merge into existing directories automatically.
The following large reference files may not be included in the archives due to size. Please download them manually:
- 1000 Genomes PLINK files: download and place in
nc/1kg_pops/; refer to prepare_1kg_reference.sh - PLINK software: download from https://www.cog-genomics.org/plink2
All scripts assume the working directory is the repository root (XMR_reproduce/).
# In R
setwd("/path/to/XMR_reproduce") # set to your local path
source("nc/code/run_XMR_AFR.R")To run XMR and the 15 compared methods, install the required R packages first:
# In R
#install.packages("devtools") #install.packages("remotes")
devtools::install_github("YangLabHKUST/XMAP")
devtools::install_github("hhoulei/TEMR")
devtools::install_github("YangLabHKUST/MR-APSS")
remotes::install_github("MRCIEU/TwoSampleMR")
devtools::install_github("jean997/cause@v1.2.0")
devtools::install_github("tye27/mr.divw")
devtools::install_github("gqi/MRMix")
devtools::install_github("xue-hr/MRcML")
devtools::install_github("rondolab/MR-PRESSO")
install.packages("MendelianRandomization")
install.packages("robustbase")Xinrui Huang, Zitong Chao, Zhiwei Wang, Xianghong Hu, and Can Yang. XMR: A cross-population Mendelian randomization method for causal inference using genome-wide summary statistics. 2026.
Please feel free to contact Xinrui Huang (xhuangcn@connect.ust.hk), Prof. Xianghong Hu (huxh@szu.edu.cn), or Prof. Can Yang (macyang@ust.hk) if you have any questions.