Skip to content

Bug: stats::lm() produces NA-containing ty.#5

Open
t-pollington wants to merge 14 commits intobnaras:masterfrom
t-pollington:master
Open

Bug: stats::lm() produces NA-containing ty.#5
t-pollington wants to merge 14 commits intobnaras:masterfrom
t-pollington:master

Conversation

@t-pollington
Copy link
Copy Markdown

@t-pollington t-pollington commented Aug 29, 2021

Following Issue #4 this is a minimal bug correction to prevent a user running bcajack2() if their Y choice leads to B * pct approximately being less than or equal to n aka ncol(Y).

Otherwise lm() will produce some NA entries in ty. as there are insufficient observations (length(ip)) to solve the matrix equation in lm(), considering the number of dimensions of x in lm() is p aka ncol(Y). Happy to explain further, as I'm aware my explanation is spread across this PR and Issue #4.

… Better than just adding `na.rm = TRUE` to s = mean(tt) as it forces user to recheck their data.
…. is already invalid as it's trying to align a vector in a dimensional space smaller than the size of the dataset n. Instead B needs to be increased by doing more bootstrap replics (if possible).
…s run with a size of Y that means B*pct (approx)< n
Imat[Ij, ] seemed to be a mistake as the cols of Imat represent the m groups.
u. <- 2 * t. - s.
gave a warning of different t. & s. lengths when applied to my data. Following "Estimation and Accuracy After Model Selection" by Efron, p995.
@t-pollington
Copy link
Copy Markdown
Author

Additional commits to bcajack() (note not bcajack2() this time) following 2 possible bugs:

  • Code correction and some refactoring of the loop to get aa and ssj. Imat[Ij, ] seemed to be a mistake as the cols of Imat represent the m groups rather than its rows.

  • Code refactoring. Since the line:
    u. <- 2 * t. - s.
    gave a warning of different t. & s. lengths when applied to my data. Code refactoring followed "Estimation and Accuracy After Model Selection" by Efron, p995 as noted in "The Automatic Construction..." paper.

Copy link
Copy Markdown
Owner

@bnaras bnaras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over these changed files and am finding it difficult to distinguish the substantive changes from the non-substantive ones (such as indentation, new lines etc.) Could you please keep the original indentation? Thank you.

@t-pollington
Copy link
Copy Markdown
Author

Please bear with me. Making a few corrections to my code so could be a few weeks before re-submission.

bnaras added a commit that referenced this pull request Apr 8, 2026
Issue #2: Grouped jackknife Imat construction was using
sapply(seq_len_m, sample.int, ...) which passed iteration values
as the first positional arg of sample.int, overriding the keyword
n= and enabling replacement. Fix: matrix(sample.int(n, n-r), nrow=m)
restores the correct without-replacement partition. Regenerated
bcajack fixture since old fixture captured buggy acceleration values.

Issue #7: Matrix subsetting x[-i, ] on single-column matrices drops
the dimension, returning a vector instead of a matrix. This causes
type inconsistency between jackknife (vector) and bootstrap (matrix)
calls to func. Fix: drop=FALSE on all 5 matrix subsetting sites
that pass data to func (jackknife_accel, bootstrap_resample, bca_nonpar).

Issue #4/#1 (+ PR #5 idea): regression_accel silently produced NAs
when ncol(Y) > length(nearby_idx), making lm() underdetermined.
Fix: explicit check with actionable error message suggesting to
increase B, increase kl_fraction, or use accel="jackknife".

PR #8 (Bettina Gruen): Missing return() in K=0 path was already
fixed in the tidy rewrite — bca_nonpar/bca_par use explicit
return(new_bcaboot(...)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants