Fix 6 pre-existing bugs across LD/RSS/TWAS pipeline#465
Merged
Conversation
1. pval_hmp: remove unique() that silently discarded duplicate
p-values, changing both the count L and the harmonic mean.
2. rss_basic_qc: convert skip_region start/end to integer after
separate(). Was doing lexicographic comparison ("9" > "10" = TRUE).
3. allele_qc: use duplicated() instead of vec_duplicate_detect() to
keep the first occurrence of duplicates. The old code removed ALL
copies (including first), silently losing data. Now warns with
count of removed duplicates.
4. compute_LD population method: add warning when missingness differs
by >10% across columns, since the N-denominator approximation
becomes biased with heterogeneous missingness.
5. gigrnd: remove the [0,1] clamp from inside the GIG sampler
(which is a general-purpose distribution, not bounded at 1).
Move psi clamping to after the GIG call in the MCMC loop via
arma::clamp(psi, 0, 1), matching original Python PRS-CS exactly.
6. sdpr: add M >= 4 guard in R wrapper. With M < 4, sample_V()
does out-of-bounds array access (a[M-2] with vector size M-1).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix 6 pre-existing bugs found during comprehensive code review. These bugs existed before the OTTERS/lassosum work but affect the pipeline.
Fixes
1.
pval_hmp()— discarded duplicate p-values (R/misc.R)unique(pvals)silently dropped duplicates, changing the test statistic2.
rss_basic_qc()— lexicographic position comparison (R/sumstats_qc.R)separate()produces character strings;pos > "10"uses string comparison where"9" > "10"is TRUEmutate(start = as.integer(start), end = as.integer(end))3.
allele_qc()— removed ALL duplicate copies (R/allele_qc.R)vec_duplicate_detect()marks ALL occurrences (including first) as TRUE, so duplicated variants were entirely lostduplicated()which keeps the first occurrence; warn with count of removed duplicates4.
compute_LD()population method — silent NA bias (R/misc.R)method='sample'as alternative5.
gigrnd()— clamped GIG distribution to [0,1] (src/prscs_mcmc.h)arma::clamp(psi, 0, 1)after the GIG call in the MCMC loop, matching original Python PRS-CS exactly6.
sdpr()— buffer overflow with M < 4 (R/regularized_regression.R)sample_V()accessesa[M-2]with vector sizeM-1; withM=1this is out-of-boundsif (M < 4) stop("M must be at least 4")in R wrapperTest plan
devtools::test()passesR CMD checkpasses🤖 Generated with Claude Code