Bug: stats::lm() produces NA-containing ty.#5
Bug: stats::lm() produces NA-containing ty.#5t-pollington wants to merge 14 commits intobnaras:masterfrom
Conversation
… Better than just adding `na.rm = TRUE` to s = mean(tt) as it forces user to recheck their data.
…. is already invalid as it's trying to align a vector in a dimensional space smaller than the size of the dataset n. Instead B needs to be increased by doing more bootstrap replics (if possible).
…s run with a size of Y that means B*pct (approx)< n
Imat[Ij, ] seemed to be a mistake as the cols of Imat represent the m groups.
u. <- 2 * t. - s. gave a warning of different t. & s. lengths when applied to my data. Following "Estimation and Accuracy After Model Selection" by Efron, p995.
|
Additional commits to
|
bnaras
left a comment
There was a problem hiding this comment.
I went over these changed files and am finding it difficult to distinguish the substantive changes from the non-substantive ones (such as indentation, new lines etc.) Could you please keep the original indentation? Thank you.
|
Please bear with me. Making a few corrections to my code so could be a few weeks before re-submission. |
Issue #2: Grouped jackknife Imat construction was using sapply(seq_len_m, sample.int, ...) which passed iteration values as the first positional arg of sample.int, overriding the keyword n= and enabling replacement. Fix: matrix(sample.int(n, n-r), nrow=m) restores the correct without-replacement partition. Regenerated bcajack fixture since old fixture captured buggy acceleration values. Issue #7: Matrix subsetting x[-i, ] on single-column matrices drops the dimension, returning a vector instead of a matrix. This causes type inconsistency between jackknife (vector) and bootstrap (matrix) calls to func. Fix: drop=FALSE on all 5 matrix subsetting sites that pass data to func (jackknife_accel, bootstrap_resample, bca_nonpar). Issue #4/#1 (+ PR #5 idea): regression_accel silently produced NAs when ncol(Y) > length(nearby_idx), making lm() underdetermined. Fix: explicit check with actionable error message suggesting to increase B, increase kl_fraction, or use accel="jackknife". PR #8 (Bettina Gruen): Missing return() in K=0 path was already fixed in the tidy rewrite — bca_nonpar/bca_par use explicit return(new_bcaboot(...)).
Following Issue #4 this is a minimal bug correction to prevent a user running
bcajack2()if theirYchoice leads toB*pctapproximately being less than or equal tonakancol(Y).Otherwise
lm()will produce someNAentries inty.as there are insufficient observations (length(ip)) to solve the matrix equation inlm(), considering the number of dimensions ofxinlm()ispakancol(Y). Happy to explain further, as I'm aware my explanation is spread across this PR and Issue #4.