Skip to content

[ENH] BayesianConjugateLinearRegressor online update in precision form#771

Open
arnavk23 wants to merge 3 commits intosktime:mainfrom
arnavk23:feat/bayesian-conjugate-update
Open

[ENH] BayesianConjugateLinearRegressor online update in precision form#771
arnavk23 wants to merge 3 commits intosktime:mainfrom
arnavk23:feat/bayesian-conjugate-update

Conversation

@arnavk23
Copy link
Contributor

@arnavk23 arnavk23 commented Feb 26, 2026

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR implements true incremental (online) learning for BayesianConjugateLinearRegressor by introducing estimator-native Bayesian posterior chaining in _update.

Key changes:

  • Refactors update computations to the precision (information) form for stable accumulation:
    • Lambda_post = Lambda_prior + beta * X^T X
    • eta_post = eta_prior + beta * X^T y
    • mu_post = Lambda_post^{-1} eta_post
  • Uses numerically stable linear solves for posterior mean computation.
  • Adds robust numeric coercion/validation for X and y to handle skpro internal table metadata/object dtypes.
  • Adds tests validating:
    • Sequential online update equivalence to one-shot batch posterior fit
    • Monotonic uncertainty reduction via posterior covariance shrinkage

Does your contribution introduce a new dependency? If yes, which one?

No new dependency is introduced.

What should a reviewer concentrate their feedback on?

  • Correctness of posterior chaining in _update (posterior used as next prior)
  • Numerical stability/consistency of the precision-form update implementation
  • Robustness of numeric coercion with skpro table-like inputs
  • Test adequacy for online equivalence + uncertainty reduction guarantees

Did you add any tests for the change?

Yes.

  • Added skpro/regression/tests/test_bayesian_conjugate_linear.py
  • Tests included:
    • Sequential updates match one-shot batch posterior
    • Posterior covariance decreases after online updates

Any other comments?

Visual validation artifacts (optional for PR body):

Synthetic streaming comparison:
bayesian_conjugate_online_uncertainty_curve
bayesian_conjugate_online_ribbons

Observed:

  • Final trace ratio (after/before): 0.167
  • Final mean sigma ratio (after/before): 0.973

Real-data comparison (Diabetes):
bayesian_conjugate_online_uncertainty_curve_diabetes
bayesian_conjugate_online_ribbons_diabetes

Observed:

  • Final trace ratio (after/before): 0.813
  • Final mean sigma ratio (after/before): 0.998

PR checklist

For all contributions
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@fkiraly fkiraly changed the title [ENH] Implement Bayesian conjugate online update in precision form [ENH] BayesianConjugateLinearRegressor online update in precision form Feb 27, 2026
@fkiraly fkiraly added enhancement module:regression probabilistic regression module labels Feb 27, 2026
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Though, may I ask why _coerce_numeric_inputs is necessary? At that point in the code, we do not expect X or y to be various types, just the type in X_inner_mtype and y_inner_mtype.

@arnavk23
Copy link
Contributor Author

arnavk23 commented Feb 27, 2026

Nice! Though, may I ask why _coerce_numeric_inputs is necessary? At that point in the code, we do not expect X or y to be various types, just the type in X_inner_mtype and y_inner_mtype.

Thank you for pointing that out. Yes, _coerce_numeric_inputs is redundant. Since X_inner_mtype and y_inner_mtype are both set to "pd_DataFrame_Table", skpro's base class already guarantees that X and y are well-formed pd.DataFrames with numeric data by the time _fit and _update are called. I'll remove _coerce_numeric_inputs and replace it with a simple .to_numpy(dtype=float) call directly in both methods.

The framework already ensures X and y are in the specified inner mtypes
(pd_DataFrame_Table), so the pd.to_numeric coercion is unnecessary.
Simplified to direct .to_numpy(dtype=float) conversions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants