Add OTTERS pipeline and fix pval_acat() sign bug#461
Conversation
Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS using multiple RSS methods: - otters_weights(): Train eQTL weights via P+T, lassosum, PRS-CS, SDPR, and any *_weights() learner (extensible via do.call dispatch) - otters_association(): Compute per-method TWAS z-scores (FUSION formula) and combine p-values via ACAT or HMP Also fix pval_acat(): the ACAT statistic is T = mean(tan(pi*(0.5-p))), but the old code used qcauchy(p) = tan(pi*(p-0.5)) which has the opposite sign, producing p-values near 1 instead of near 0 for significant results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lassosum_rss_weights() now searches over s = c(0.2, 0.5, 0.9, 1.0) by default, matching the original lassosum (Mak et al 2017) and the OTTERS pipeline (Zhang et al 2024). For each s, the LD is shrunk as R_s = (1-s)*R + s*I and the full lambda path is solved. The best (s, lambda) is selected by lowest objective value. Previously users had to pre-shrink LD with a fixed s before calling lassosum_rss_weights(), which defaulted to no shrinkage or a single hard-coded value. With the grid search: - cor(lassosum, truth) improved from 0.81 (s=0.9) to 0.93 (s=0.2) - cor(lassosum, PRS-CS) improved from 0.91 to 0.99 - cor(lassosum, SDPR) improved from 0.87 to 0.99 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix defaults to match the original OTTERS pipeline (Zhang et al 2024): - PRS-CS: phi=1e-4 (fixed, not learned) -- OTTERS default - SDPR: thin=1 (no thinning) -- OTTERS default - lassosum: lambda lower bound 0.0001 (was 0.001) -- OTTERS default - Add cor>=1 safeguard in otters_weights() matching OTTERS shrink_factor Vignette updated to show all defaults explicitly, explain each method, and demonstrate extensibility using mr.ash.rss as example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues found in second-pass code review: 1. P+T weights used raw z/sqrt(n) instead of the clamped stat$b, bypassing the cor>=1 safeguard. Fixed to use stat$b consistently. 2. otters_association() had no dimension validation between weights, gwas_z, and LD. Added explicit checks with informative error messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show the complete data loading workflow using ld_loader() with PLINK2 pgen data (ADSP chr22), matching the susieAnn vignette pattern: - load_LD_matrix() for direct loading - ld_loader() for lazy on-demand loading - compute_LD() with shrinkage for regularized LD Demonstrate genome-wide usage: create loader from R_list, loop over genes calling otters_weights() + otters_association() per gene, passing loader(g) as the LD matrix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OTTERS Pipeline Implementation OverviewWhat we implementedThe pecotmr OTTERS pipeline reimplements the core statistical logic of OTTERS (Zhang et al. 2024, Nature Communications) in pure R/C++, removing the dependency on Python, PLINK subprocesses, and external binaries. The pipeline has two functions:
Supporting additions:
Component-by-component comparison with original OTTERS1. P+T (Pruning and Thresholding)
Equivalence: Same statistical formula. The LD clumping is an architectural difference — OTTERS bundles it; pecotmr delegates to the existing QC pipeline. 2. PRS-CS
Equivalence: Identical MCMC algorithm. C++ is faster than Python. Both force PD before Cholesky. 3. SDPR
Equivalence: Same Bayesian mixture model and MCMC sampler. Armadillo replaces SSE for ARM compatibility. 4. lassosum
Equivalence: Betas identical to machine epsilon. Model selection differs (pseudovalidation vs objective). 5. TWAS z-score (FUSION formula)
Equivalence: Identical formula. 6. ACAT combination
Equivalence: pecotmr has a complete, working ACAT. OTTERS' was unfinished. 7. LD quality checking
Default parameter comparison
17/20 parameters match exactly. The 3 differences are architectural choices (not statistical). |
original lassosum has a bug comoputing fbeta. It's fixed in our package so we can use fbeta for model selection. |
Summary
Implement the OTTERS framework (Zhang et al. 2024) for omnibus TWAS using multiple RSS methods, plus fix a sign bug in
pval_acat().New functions
otters_weights(sumstats, LD, n, methods, p_thresholds)— Stage I: train eQTL weights via P+T + any RSS*_weights()learner (lassosum, PRS-CS, SDPR, mr.ash.rss, etc.). Methods are dispatched dynamically viado.call, so adding a new learner requires zero code changes.otters_association(weights, gwas_z, LD, combine_method)— Stage II: compute per-method TWAS z-scores (FUSION formula) and combine p-values via ACAT or HMP.Bug fix
pval_acat()had a sign error: it usedqcauchy(p) = tan(π(p-0.5))but the ACAT statistic isT = mean(tan(π(0.5-p))). The opposite sign caused combined p-values near 1.0 instead of near 0 for significant results.Validation (simulated data: n_eqtl=500, n_gwas=50000, p=50, 4 causal eQTLs)
Test plan
otters_weights()andotters_association()(input validation, P+T selection, end-to-end)devtools::test(filter="otters")R CMD check🤖 Generated with Claude Code