Skip to content

Fix 6 pre-existing bugs across LD/RSS/TWAS pipeline#465

Merged
gaow merged 1 commit intomainfrom
fix-preexisting-issues
Apr 7, 2026
Merged

Fix 6 pre-existing bugs across LD/RSS/TWAS pipeline#465
gaow merged 1 commit intomainfrom
fix-preexisting-issues

Conversation

@gaow
Copy link
Copy Markdown
Contributor

@gaow gaow commented Apr 7, 2026

Summary

Fix 6 pre-existing bugs found during comprehensive code review. These bugs existed before the OTTERS/lassosum work but affect the pipeline.

Fixes

1. pval_hmp() — discarded duplicate p-values (R/misc.R)

  • unique(pvals) silently dropped duplicates, changing the test statistic
  • Fix: use all p-values as-is

2. rss_basic_qc() — lexicographic position comparison (R/sumstats_qc.R)

  • separate() produces character strings; pos > "10" uses string comparison where "9" > "10" is TRUE
  • Fix: mutate(start = as.integer(start), end = as.integer(end))

3. allele_qc() — removed ALL duplicate copies (R/allele_qc.R)

  • vec_duplicate_detect() marks ALL occurrences (including first) as TRUE, so duplicated variants were entirely lost
  • Fix: use duplicated() which keeps the first occurrence; warn with count of removed duplicates

4. compute_LD() population method — silent NA bias (R/misc.R)

  • Cross-product divides by total N but per-column stats use non-missing counts; biased when missingness varies across SNPs
  • Fix: warn when max-min NA rate difference exceeds 10%, suggesting method='sample' as alternative

5. gigrnd() — clamped GIG distribution to [0,1] (src/prscs_mcmc.h)

  • The GIG distribution is unbounded above; clamping inside the sampler biases the distribution
  • Fix: remove internal clamp, add arma::clamp(psi, 0, 1) after the GIG call in the MCMC loop, matching original Python PRS-CS exactly

6. sdpr() — buffer overflow with M < 4 (R/regularized_regression.R)

  • sample_V() accesses a[M-2] with vector size M-1; with M=1 this is out-of-bounds
  • Fix: add if (M < 4) stop("M must be at least 4") in R wrapper

Test plan

  • PRS-CS compiles and produces valid results after gigrnd fix
  • All psi values remain <= 1 (clamping works in new location)
  • devtools::test() passes
  • R CMD check passes

🤖 Generated with Claude Code

1. pval_hmp: remove unique() that silently discarded duplicate
   p-values, changing both the count L and the harmonic mean.

2. rss_basic_qc: convert skip_region start/end to integer after
   separate(). Was doing lexicographic comparison ("9" > "10" = TRUE).

3. allele_qc: use duplicated() instead of vec_duplicate_detect() to
   keep the first occurrence of duplicates. The old code removed ALL
   copies (including first), silently losing data. Now warns with
   count of removed duplicates.

4. compute_LD population method: add warning when missingness differs
   by >10% across columns, since the N-denominator approximation
   becomes biased with heterogeneous missingness.

5. gigrnd: remove the [0,1] clamp from inside the GIG sampler
   (which is a general-purpose distribution, not bounded at 1).
   Move psi clamping to after the GIG call in the MCMC loop via
   arma::clamp(psi, 0, 1), matching original Python PRS-CS exactly.

6. sdpr: add M >= 4 guard in R wrapper. With M < 4, sample_V()
   does out-of-bounds array access (a[M-2] with vector size M-1).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gaow gaow merged commit 7bd00ad into main Apr 7, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant