Skip to content

Add OTTERS pipeline and fix pval_acat() sign bug#461

Merged
gaow merged 8 commits intomainfrom
add-otters-pipeline
Apr 7, 2026
Merged

Add OTTERS pipeline and fix pval_acat() sign bug#461
gaow merged 8 commits intomainfrom
add-otters-pipeline

Conversation

@gaow
Copy link
Copy Markdown
Contributor

@gaow gaow commented Apr 7, 2026

Summary

Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS using multiple RSS methods, plus fix a sign bug in pval_acat().

New functions

  • otters_weights(sumstats, LD, n, methods, p_thresholds) — Stage I: train eQTL weights via P+T + any RSS *_weights() learner (lassosum, PRS-CS, SDPR, mr.ash.rss, etc.). Methods are dispatched dynamically via do.call, so adding a new learner requires zero code changes.
  • otters_association(weights, gwas_z, LD, combine_method) — Stage II: compute per-method TWAS z-scores (FUSION formula) and combine p-values via ACAT or HMP.

Bug fix

pval_acat() had a sign error: it used qcauchy(p) = tan(π(p-0.5)) but the ACAT statistic is T = mean(tan(π(0.5-p))). The opposite sign caused combined p-values near 1.0 instead of near 0 for significant results.

Validation (simulated data: n_eqtl=500, n_gwas=50000, p=50, 4 causal eQTLs)

         method   twas_z    twas_pval n_snps
1      PT_0.001 4.921729 8.578311e-07      3
2       PT_0.05 4.770889 1.834144e-06      5
3  lassosum_rss 4.571758 4.836488e-06     30
4        prs_cs 4.613558 3.958331e-06     50
5 ACAT_combined       NA 1.843031e-06     NA

Test plan

  • Unit tests for otters_weights() and otters_association() (input validation, P+T selection, end-to-end)
  • Vignette with simulated data demonstrating full pipeline
  • devtools::test(filter="otters")
  • R CMD check

🤖 Generated with Claude Code

gaow and others added 8 commits April 7, 2026 08:34
Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS
using multiple RSS methods:

- otters_weights(): Train eQTL weights via P+T, lassosum, PRS-CS,
  SDPR, and any *_weights() learner (extensible via do.call dispatch)
- otters_association(): Compute per-method TWAS z-scores (FUSION
  formula) and combine p-values via ACAT or HMP

Also fix pval_acat(): the ACAT statistic is T = mean(tan(pi*(0.5-p))),
but the old code used qcauchy(p) = tan(pi*(p-0.5)) which has the
opposite sign, producing p-values near 1 instead of near 0 for
significant results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lassosum_rss_weights() now searches over s = c(0.2, 0.5, 0.9, 1.0)
by default, matching the original lassosum (Mak et al 2017) and the
OTTERS pipeline (Zhang et al 2024). For each s, the LD is shrunk as
R_s = (1-s)*R + s*I and the full lambda path is solved. The best
(s, lambda) is selected by lowest objective value.

Previously users had to pre-shrink LD with a fixed s before calling
lassosum_rss_weights(), which defaulted to no shrinkage or a single
hard-coded value. With the grid search:
- cor(lassosum, truth) improved from 0.81 (s=0.9) to 0.93 (s=0.2)
- cor(lassosum, PRS-CS) improved from 0.91 to 0.99
- cor(lassosum, SDPR) improved from 0.87 to 0.99

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix defaults to match the original OTTERS pipeline (Zhang et al 2024):
- PRS-CS: phi=1e-4 (fixed, not learned) -- OTTERS default
- SDPR: thin=1 (no thinning) -- OTTERS default
- lassosum: lambda lower bound 0.0001 (was 0.001) -- OTTERS default
- Add cor>=1 safeguard in otters_weights() matching OTTERS shrink_factor

Vignette updated to show all defaults explicitly, explain each method,
and demonstrate extensibility using mr.ash.rss as example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues found in second-pass code review:

1. P+T weights used raw z/sqrt(n) instead of the clamped stat$b,
   bypassing the cor>=1 safeguard. Fixed to use stat$b consistently.

2. otters_association() had no dimension validation between weights,
   gwas_z, and LD. Added explicit checks with informative error messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show the complete data loading workflow using ld_loader() with
PLINK2 pgen data (ADSP chr22), matching the susieAnn vignette pattern:
- load_LD_matrix() for direct loading
- ld_loader() for lazy on-demand loading
- compute_LD() with shrinkage for regularized LD

Demonstrate genome-wide usage: create loader from R_list, loop over
genes calling otters_weights() + otters_association() per gene,
passing loader(g) as the LD matrix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gaow gaow merged commit 4226cec into main Apr 7, 2026
4 of 5 checks passed
@gaow
Copy link
Copy Markdown
Contributor Author

gaow commented Apr 7, 2026

OTTERS Pipeline Implementation Overview

What we implemented

The pecotmr OTTERS pipeline reimplements the core statistical logic of OTTERS (Zhang et al. 2024, Nature Communications) in pure R/C++, removing the dependency on Python, PLINK subprocesses, and external binaries. The pipeline has two functions:

  1. otters_weights(sumstats, LD, n, methods, p_thresholds, check_ld_method) — Stage I (training): trains eQTL weights using multiple RSS methods on the same data, dispatched dynamically.

  2. otters_association(weights, gwas_z, LD, combine_method) — Stage II (testing): computes per-method TWAS z-scores via the FUSION formula and combines p-values via ACAT.

Supporting additions:

  • check_ld(R, method) — LD matrix positive-definiteness checking and repair
  • lassosum_rss_weights() — now includes s-grid search over c(0.2, 0.5, 0.9, 1.0)
  • pval_acat() — fixed sign bug in Cauchy combination

Component-by-component comparison with original OTTERS

1. P+T (Pruning and Thresholding)

Aspect Original OTTERS pecotmr
P-value computation chi2.sf(Z^2, 1) pchisq(z^2, 1, lower.tail=FALSE)
Weight for selected SNPs Z / sqrt(median_N) (marginal correlation) stat$b = clamped z / sqrt(n)
LD clumping before P+T Yes (PLINK, r2=0.99) No (user handles upstream via rss_basic_qc())
cor >= 1 safeguard cor / (max(abs(cor)) / 0.9999) Same: b / (max(abs(b)) / 0.9999)

Equivalence: Same statistical formula. The LD clumping is an architectural difference — OTTERS bundles it; pecotmr delegates to the existing QC pipeline.

2. PRS-CS

Aspect Original OTTERS pecotmr
Implementation Python MCMC (mcmc_gtb.py) C++ via Rcpp (prscs_mcmc.h)
Input scale Z / sqrt(N) as beta_mrg z / sqrt(n) as bhat — same
GIG sampler Python gigrnd.py C++ gigrnd() — same algorithm
Cholesky sampling scipy.linalg (forward/back-solve) Armadillo trimatl/trimatu solve — same
LD positive-definiteness SVD-based pos_def_matrix() before PRS-CS check_ld(method="eigenfix") in otters_weights() — equivalent

Equivalence: Identical MCMC algorithm. C++ is faster than Python. Both force PD before Cholesky.

3. SDPR

Aspect Original OTTERS pecotmr
Implementation External SDPR binary (C++) C++ via Rcpp (sdpr_mcmc.cpp), Armadillo
Input scale Raw Z-scores + N z/sqrt(n) as bhat + N — equivalent (N absorbed in solve_ldmat)
LD block partitioning SDPR -make_ref (r2=0.1 blocks) Single block per gene region
Internal LD regularization B.diag += a (a=0.1) Same: B.diag() += a
SSE intrinsics x86 SSE (log_ps, exp_ps) Armadillo vectorized (arma::log, arma::exp) — portable

Equivalence: Same Bayesian mixture model and MCMC sampler. Armadillo replaces SSE for ARM compatibility.

4. lassosum

Aspect Original OTTERS pecotmr
Implementation R lassosum package C++ coordinate descent (lassosum_rss.cpp)
Coordinate descent lassosum elnet() C++ Our elnet_rss() C++ — verified identical to 1e-15
s grid search c(0.2, 0.5, 0.9, 1) c(0.2, 0.5, 0.9, 1.0) — same
Lambda grid exp(seq(log(1e-4), log(0.1), length.out=20)) Same
Model selection pseudovalidate() from lassosum package which.min(fbeta) (objective value)
cor >= 1 handling cor / (max/0.9999) Same safeguard in otters_weights()

Equivalence: Betas identical to machine epsilon. Model selection differs (pseudovalidation vs objective).

5. TWAS z-score (FUSION formula)

Aspect Original OTTERS pecotmr
Formula Z = dot(z,w) / sqrt(w' V w) Z = t(w) %*% z / sqrt(t(w) %*% R %*% w)
P-value chi2.sf(Z^2, 1) pchisq(Z^2, 1, lower.tail=FALSE)

Equivalence: Identical formula.

6. ACAT combination

Aspect Original OTTERS pecotmr
Implementation Incomplete (imports ACAT package but never calls it) pval_acat(): T = mean(tan(pi*(0.5-p)))

Equivalence: pecotmr has a complete, working ACAT. OTTERS' was unfinished.

7. LD quality checking

Aspect Original OTTERS pecotmr
When applied Before PRS-CS only Before all methods in otters_weights()
Method SVD: (mat + V' diag(s) V) / 2 Eigenfix: set negative eigenvalues to 0, reconstruct
Configurable No Yes: check_ld_method = "eigenfix" / "shrink" / NULL

Default parameter comparison

Parameter OTTERS default pecotmr default Match?
P+T thresholds c(0.001, 0.05) c(0.001, 0.05) YES
PRS-CS a 1 1 YES
PRS-CS b 0.5 0.5 YES
PRS-CS phi 1e-4 (fixed) 1e-4 (fixed) YES
PRS-CS n_iter 1000 1000 YES
PRS-CS n_burnin 500 500 YES
PRS-CS thin 5 5 YES
SDPR iter 1000 1000 YES
SDPR burn 200 200 YES
SDPR thin 1 1 YES
SDPR M 1000 1000 YES
SDPR a 0.1 0.1 YES
SDPR c 1.0 1.0 YES
SDPR a0k 0.5 0.5 YES
SDPR b0k 0.5 0.5 YES
lassosum s c(0.2, 0.5, 0.9, 1) c(0.2, 0.5, 0.9, 1.0) YES
lassosum lambda range 1e-4 to 0.1 (20 pts) 1e-4 to 0.1 (20 pts) YES
lassosum model selection pseudovalidate min(fbeta) DIFF
cor >= 1 safeguard shrink by max/0.9999 shrink by max/0.9999 YES
LD clumping (r2=0.99) Yes (PLINK) No (user upstream) DIFF
LD PD forcing SVD for PRS-CS only eigenfix for all methods DIFF
TWAS z formula w'z / sqrt(w'Rw) w'z / sqrt(w'Rw) YES
ACAT formula (incomplete) mean(tan(pi(0.5-p))) YES

17/20 parameters match exactly. The 3 differences are architectural choices (not statistical).

@gaow
Copy link
Copy Markdown
Contributor Author

gaow commented Apr 7, 2026

lassosum model selection

original lassosum has a bug comoputing fbeta. It's fixed in our package so we can use fbeta for model selection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant