Add OTTERS pipeline and fix pval_acat() sign bug by gaow · Pull Request #461 · StatFunGen/pecotmr

gaow · 2026-04-07T12:35:01Z

Summary

Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS using multiple RSS methods, plus fix a sign bug in pval_acat().

New functions

otters_weights(sumstats, LD, n, methods, p_thresholds) — Stage I: train eQTL weights via P+T + any RSS *_weights() learner (lassosum, PRS-CS, SDPR, mr.ash.rss, etc.). Methods are dispatched dynamically via do.call, so adding a new learner requires zero code changes.
otters_association(weights, gwas_z, LD, combine_method) — Stage II: compute per-method TWAS z-scores (FUSION formula) and combine p-values via ACAT or HMP.

Bug fix

pval_acat() had a sign error: it used qcauchy(p) = tan(π(p-0.5)) but the ACAT statistic is T = mean(tan(π(0.5-p))). The opposite sign caused combined p-values near 1.0 instead of near 0 for significant results.

Validation (simulated data: n_eqtl=500, n_gwas=50000, p=50, 4 causal eQTLs)

         method   twas_z    twas_pval n_snps
1      PT_0.001 4.921729 8.578311e-07      3
2       PT_0.05 4.770889 1.834144e-06      5
3  lassosum_rss 4.571758 4.836488e-06     30
4        prs_cs 4.613558 3.958331e-06     50
5 ACAT_combined       NA 1.843031e-06     NA

Test plan

Unit tests for otters_weights() and otters_association() (input validation, P+T selection, end-to-end)
Vignette with simulated data demonstrating full pipeline
devtools::test(filter="otters")
R CMD check

🤖 Generated with Claude Code

Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS using multiple RSS methods: - otters_weights(): Train eQTL weights via P+T, lassosum, PRS-CS, SDPR, and any *_weights() learner (extensible via do.call dispatch) - otters_association(): Compute per-method TWAS z-scores (FUSION formula) and combine p-values via ACAT or HMP Also fix pval_acat(): the ACAT statistic is T = mean(tan(pi*(0.5-p))), but the old code used qcauchy(p) = tan(pi*(p-0.5)) which has the opposite sign, producing p-values near 1 instead of near 0 for significant results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lassosum_rss_weights() now searches over s = c(0.2, 0.5, 0.9, 1.0) by default, matching the original lassosum (Mak et al 2017) and the OTTERS pipeline (Zhang et al 2024). For each s, the LD is shrunk as R_s = (1-s)*R + s*I and the full lambda path is solved. The best (s, lambda) is selected by lowest objective value. Previously users had to pre-shrink LD with a fixed s before calling lassosum_rss_weights(), which defaulted to no shrinkage or a single hard-coded value. With the grid search: - cor(lassosum, truth) improved from 0.81 (s=0.9) to 0.93 (s=0.2) - cor(lassosum, PRS-CS) improved from 0.91 to 0.99 - cor(lassosum, SDPR) improved from 0.87 to 0.99 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix defaults to match the original OTTERS pipeline (Zhang et al 2024): - PRS-CS: phi=1e-4 (fixed, not learned) -- OTTERS default - SDPR: thin=1 (no thinning) -- OTTERS default - lassosum: lambda lower bound 0.0001 (was 0.001) -- OTTERS default - Add cor>=1 safeguard in otters_weights() matching OTTERS shrink_factor Vignette updated to show all defaults explicitly, explain each method, and demonstrate extensibility using mr.ash.rss as example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two issues found in second-pass code review: 1. P+T weights used raw z/sqrt(n) instead of the clamped stat$b, bypassing the cor>=1 safeguard. Fixed to use stat$b consistently. 2. otters_association() had no dimension validation between weights, gwas_z, and LD. Added explicit checks with informative error messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Show the complete data loading workflow using ld_loader() with PLINK2 pgen data (ADSP chr22), matching the susieAnn vignette pattern: - load_LD_matrix() for direct loading - ld_loader() for lazy on-demand loading - compute_LD() with shrinkage for regularized LD Demonstrate genome-wide usage: create loader from R_list, loop over genes calling otters_weights() + otters_association() per gene, passing loader(g) as the LD matrix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gaow · 2026-04-07T13:29:31Z

OTTERS Pipeline Implementation Overview

What we implemented

The pecotmr OTTERS pipeline reimplements the core statistical logic of OTTERS (Zhang et al. 2024, Nature Communications) in pure R/C++, removing the dependency on Python, PLINK subprocesses, and external binaries. The pipeline has two functions:

otters_weights(sumstats, LD, n, methods, p_thresholds, check_ld_method) — Stage I (training): trains eQTL weights using multiple RSS methods on the same data, dispatched dynamically.
otters_association(weights, gwas_z, LD, combine_method) — Stage II (testing): computes per-method TWAS z-scores via the FUSION formula and combines p-values via ACAT.

Supporting additions:

check_ld(R, method) — LD matrix positive-definiteness checking and repair
lassosum_rss_weights() — now includes s-grid search over c(0.2, 0.5, 0.9, 1.0)
pval_acat() — fixed sign bug in Cauchy combination

Component-by-component comparison with original OTTERS

1. P+T (Pruning and Thresholding)

Aspect	Original OTTERS	pecotmr
P-value computation	`chi2.sf(Z^2, 1)`	`pchisq(z^2, 1, lower.tail=FALSE)`
Weight for selected SNPs	`Z / sqrt(median_N)` (marginal correlation)	`stat$b` = clamped `z / sqrt(n)`
LD clumping before P+T	Yes (PLINK, r2=0.99)	No (user handles upstream via `rss_basic_qc()`)
cor >= 1 safeguard	`cor / (max(abs(cor)) / 0.9999)`	Same: `b / (max(abs(b)) / 0.9999)`

Equivalence: Same statistical formula. The LD clumping is an architectural difference — OTTERS bundles it; pecotmr delegates to the existing QC pipeline.

2. PRS-CS

Aspect	Original OTTERS	pecotmr
Implementation	Python MCMC (mcmc_gtb.py)	C++ via Rcpp (prscs_mcmc.h)
Input scale	`Z / sqrt(N)` as beta_mrg	`z / sqrt(n)` as bhat — same
GIG sampler	Python gigrnd.py	C++ gigrnd() — same algorithm
Cholesky sampling	scipy.linalg (forward/back-solve)	Armadillo trimatl/trimatu solve — same
LD positive-definiteness	SVD-based `pos_def_matrix()` before PRS-CS	`check_ld(method="eigenfix")` in otters_weights() — equivalent

Equivalence: Identical MCMC algorithm. C++ is faster than Python. Both force PD before Cholesky.

3. SDPR

Aspect	Original OTTERS	pecotmr
Implementation	External SDPR binary (C++)	C++ via Rcpp (sdpr_mcmc.cpp), Armadillo
Input scale	Raw Z-scores + N	`z/sqrt(n)` as bhat + N — equivalent (N absorbed in solve_ldmat)
LD block partitioning	SDPR `-make_ref` (r2=0.1 blocks)	Single block per gene region
Internal LD regularization	`B.diag += a` (a=0.1)	Same: `B.diag() += a`
SSE intrinsics	x86 SSE (log_ps, exp_ps)	Armadillo vectorized (arma::log, arma::exp) — portable

Equivalence: Same Bayesian mixture model and MCMC sampler. Armadillo replaces SSE for ARM compatibility.

4. lassosum

Aspect	Original OTTERS	pecotmr
Implementation	R lassosum package	C++ coordinate descent (lassosum_rss.cpp)
Coordinate descent	lassosum elnet() C++	Our elnet_rss() C++ — verified identical to 1e-15
s grid search	c(0.2, 0.5, 0.9, 1)	c(0.2, 0.5, 0.9, 1.0) — same
Lambda grid	`exp(seq(log(1e-4), log(0.1), length.out=20))`	Same
Model selection	`pseudovalidate()` from lassosum package	`which.min(fbeta)` (objective value)
cor >= 1 handling	`cor / (max/0.9999)`	Same safeguard in `otters_weights()`

Equivalence: Betas identical to machine epsilon. Model selection differs (pseudovalidation vs objective).

5. TWAS z-score (FUSION formula)

Aspect	Original OTTERS	pecotmr
Formula	`Z = dot(z,w) / sqrt(w' V w)`	`Z = t(w) %% z / sqrt(t(w) %% R %*% w)`
P-value	`chi2.sf(Z^2, 1)`	`pchisq(Z^2, 1, lower.tail=FALSE)`

Equivalence: Identical formula.

6. ACAT combination

Aspect	Original OTTERS	pecotmr
Implementation	Incomplete (imports ACAT package but never calls it)	`pval_acat()`: `T = mean(tan(pi*(0.5-p)))`

Equivalence: pecotmr has a complete, working ACAT. OTTERS' was unfinished.

7. LD quality checking

Aspect	Original OTTERS	pecotmr
When applied	Before PRS-CS only	Before all methods in `otters_weights()`
Method	SVD: `(mat + V' diag(s) V) / 2`	Eigenfix: set negative eigenvalues to 0, reconstruct
Configurable	No	Yes: `check_ld_method = "eigenfix" / "shrink" / NULL`

Default parameter comparison

Parameter	OTTERS default	pecotmr default	Match?
P+T thresholds	c(0.001, 0.05)	c(0.001, 0.05)	YES
PRS-CS a	1	1	YES
PRS-CS b	0.5	0.5	YES
PRS-CS phi	1e-4 (fixed)	1e-4 (fixed)	YES
PRS-CS n_iter	1000	1000	YES
PRS-CS n_burnin	500	500	YES
PRS-CS thin	5	5	YES
SDPR iter	1000	1000	YES
SDPR burn	200	200	YES
SDPR thin	1	1	YES
SDPR M	1000	1000	YES
SDPR a	0.1	0.1	YES
SDPR c	1.0	1.0	YES
SDPR a0k	0.5	0.5	YES
SDPR b0k	0.5	0.5	YES
lassosum s	c(0.2, 0.5, 0.9, 1)	c(0.2, 0.5, 0.9, 1.0)	YES
lassosum lambda range	1e-4 to 0.1 (20 pts)	1e-4 to 0.1 (20 pts)	YES
lassosum model selection	pseudovalidate	min(fbeta)	DIFF
cor >= 1 safeguard	shrink by max/0.9999	shrink by max/0.9999	YES
LD clumping (r2=0.99)	Yes (PLINK)	No (user upstream)	DIFF
LD PD forcing	SVD for PRS-CS only	eigenfix for all methods	DIFF
TWAS z formula	w'z / sqrt(w'Rw)	w'z / sqrt(w'Rw)	YES
ACAT formula	(incomplete)	mean(tan(pi(0.5-p)))	YES

17/20 parameters match exactly. The 3 differences are architectural choices (not statistical).

gaow · 2026-04-07T13:44:45Z

lassosum model selection

original lassosum has a bug comoputing fbeta. It's fixed in our package so we can use fbeta for model selection.

gaow and others added 8 commits April 7, 2026 08:34

Update documentation

d814002

Update documentation

22b8883

Update documentation

db666d3

gaow merged commit 4226cec into main Apr 7, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OTTERS pipeline and fix pval_acat() sign bug#461

Add OTTERS pipeline and fix pval_acat() sign bug#461
gaow merged 8 commits intomainfrom
add-otters-pipeline

gaow commented Apr 7, 2026

Uh oh!

Uh oh!

gaow commented Apr 7, 2026

Uh oh!

gaow commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaow commented Apr 7, 2026

Summary

New functions

Bug fix

Validation (simulated data: n_eqtl=500, n_gwas=50000, p=50, 4 causal eQTLs)

Test plan

Uh oh!

Uh oh!

gaow commented Apr 7, 2026

OTTERS Pipeline Implementation Overview

What we implemented

Component-by-component comparison with original OTTERS

1. P+T (Pruning and Thresholding)

2. PRS-CS

3. SDPR

4. lassosum

5. TWAS z-score (FUSION formula)

6. ACAT combination

7. LD quality checking

Default parameter comparison

Uh oh!

gaow commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant