Add lassosum RSS method with C++ coordinate descent by gaow · Pull Request #457 · StatFunGen/pecotmr

gaow · 2026-04-07T09:51:42Z

Summary

Port lassosum (Mak et al 2017) to work with pre-computed LD matrices and summary statistics (RSS formulation)
C++ coordinate descent solver in src/lassosum_rss.cpp, tracking Rbeta as a running sum to match the original elnet() algorithm
lassosum_rss(bhat, LD, n, ...) and lassosum_rss_weights(stat, LD, ...) with identical interface to prs_cs/sdpr
Add shrinkage parameter to compute_LD() for LD regularization: R_s = (1-s)*R + s*I
Unit tests mirroring existing prs_cs test patterns

Test plan

devtools::test(filter="regularized_regression") passes
R CMD check passes
Smoke test with chr22 pgen dosage data

🤖 Generated with Claude Code

Port lassosum (Mak et al 2017) to work with pre-computed LD matrices and summary statistics. Uses the same interface as prs_cs/sdpr: lassosum_rss(bhat, LD, n, ...) with LD as a list of block matrices. - Add C++ coordinate descent solver (src/lassosum_rss.cpp) tracking Rbeta as a running sum, matching the original elnet() algorithm - Add lassosum_rss() and lassosum_rss_weights() R wrappers - Add shrinkage parameter to compute_LD() for LD regularization - Add unit tests mirroring existing prs_cs test patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gaow · 2026-04-07T11:05:35Z

Validation Experiments

Setup

All experiments use simulated data: genotype matrix X (binomial or normal), sparse true effects, and marginal summary statistics. The pecotmr lassosum_rss_rcpp C++ function is compared against the original lassosum R package (v0.4.5, Mak et al 2017).

The pecotmr version takes a pre-shrunk LD matrix R_s = (1-s)*cor(X) + s*I and correlation vector z = bhat/sqrt(n). The original lassosum:::lassosumR takes the raw genotype matrix and applies shrinkage internally.

Experiment 1: Numerical identity with original lassosum

Compared beta coefficient matrices across 9 configurations (3 seeds × 3 shrinkage values), n=500, p=40, 20 lambda values each:

seed	s=0.1	s=0.5	s=0.9
42	1.2e-15	6.0e-16	1.5e-16
123	3.6e-15	2.0e-15	5.6e-16
999	3.1e-15	1.6e-15	5.0e-16

Values are max|beta_diff| across all p×20 = 800 coefficients per configuration.

Result: Betas are identical to machine epsilon (~1e-15) across all seeds, shrinkage values, and lambda values. KKT optimality conditions verified with zero violations.

Experiment 2: Comparison with PRS-CS

Simulated n=1000, p=20, 3 causal SNPs with effects (0.3, -0.2, 0.15):

Method	cor(est, truth)	cor(est, lassosum)	nnz
lassosum_rss (s=0.5)	0.9864	1.0000	9
prs_cs	0.9548	0.9835	20

Result: Both methods recover the true signal direction well (cor > 0.95 with truth). Lassosum is sparser (L1 penalty), PRS-CS is denser (continuous shrinkage prior). They agree with each other at cor=0.98.

SDPR could not be compiled on ARM/Apple Silicon due to SSE intrinsics in sse_mathfun.h (pre-existing issue, not related to this PR).

Note on `fbeta` (objective value) computation

During validation, we discovered a bug in the original lassosum:::lassosumR R function (the internal pure-R implementation, not the main exported C++ path). In elnetR.R, the yhat vector (= X_scaled × beta) is initialized once and modified in-place by repelnet() across the lambda path. The C++ repelnet() accumulates yhat += yhattouse (line 479 of functions.cpp), but yhat is never reset between lambda values. This causes loss and fbeta to be inflated for lambdas solved after the first one (warm-start goes from largest to smallest lambda).

Impact: The betas at every lambda are always correct (the coordinate descent uses yhat properly within each solve). Only loss/fbeta/pred stored by lassosumR() are wrong. The main lassosum() function uses the C++ runElnet() which does not have this bug. lassosumR() is an internal unexported helper.

Our pecotmr implementation computes fbeta = β'R_sβ - 2z'β + 2λ||β||₁ directly from the LD matrix, which is mathematically correct and verified against manual computation to ~1e-16 precision.

🤖 Generated with Claude Code

The original SDPR (Zhou et al.) used x86 SSE intrinsics (log_ps, exp_ps, _mm_max_ps, _mm_hadd_ps from sse_mathfun.h) for computing log-sum-exp over M cluster probabilities in sample_assignment(). This prevented compilation on ARM/Apple Silicon. Replace with Armadillo vectorized operations (arma::log, arma::exp, arma::max, arma::accu) which delegate to platform-optimal SIMD (NEON on ARM, SSE/AVX on x86) through the compiler auto-vectorization, giving portable performance without architecture-specific intrinsics. - Rewrite sample_assignment() using arma::vec for cluster probabilities - Remove src/sse_mathfun.h (719 lines of x86 SSE math functions) - Remove src/simde/ directory (~789K lines of x86→ARM translation headers) - Remove SIMDE_ENABLE_NATIVE_ALIASES flag from Makevars.in Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 767abc6.

gaow and others added 2 commits April 7, 2026 05:50

Update documentation

f286db9

gaow and others added 2 commits April 7, 2026 07:21

Revert "Replace SSE intrinsics with Armadillo vectorized ops in SDPR"

6ea5320

This reverts commit 767abc6.

gaow merged commit 7213539 into main Apr 7, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lassosum RSS method with C++ coordinate descent#457

Add lassosum RSS method with C++ coordinate descent#457
gaow merged 4 commits intomainfrom
add-lassosum-rss

gaow commented Apr 7, 2026

Uh oh!

gaow commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaow commented Apr 7, 2026

Summary

Test plan

Uh oh!

gaow commented Apr 7, 2026

Validation Experiments

Setup

Experiment 1: Numerical identity with original lassosum

Experiment 2: Comparison with PRS-CS

Note on fbeta (objective value) computation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Note on `fbeta` (objective value) computation