Skip to content

Parameter validation for kernel factories#776

Open
AdrianSosic wants to merge 22 commits intodev/gpfrom
feature/parameter_support
Open

Parameter validation for kernel factories#776
AdrianSosic wants to merge 22 commits intodev/gpfrom
feature/parameter_support

Conversation

@AdrianSosic
Copy link
Copy Markdown
Collaborator

DevPR, parent is #745

Adds a validation mechanism to ensure kernel factories only produce kernels for search spaces they are intended for.
This is achieved via a new ParameterKind flag enum that factories can use to signal which parameter types they support.

@AdrianSosic AdrianSosic self-assigned this Apr 2, 2026
@AdrianSosic AdrianSosic requested a review from Scienfitz as a code owner April 2, 2026 11:55
@AdrianSosic AdrianSosic added the new feature New functionality label Apr 2, 2026
@AdrianSosic AdrianSosic requested a review from AVHopp as a code owner April 2, 2026 11:55
@AdrianSosic AdrianSosic added the dev label Apr 2, 2026
Copilot AI review requested due to automatic review settings April 2, 2026 11:55
@AdrianSosic AdrianSosic changed the title Feature/parameter support Parameter validation for kernel factories Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a parameter-kind validation mechanism for Gaussian process kernel factories and updates kernel translation to infer dimensions from a SearchSpace, enabling factories to explicitly declare and enforce which parameter roles (e.g., task vs. regular) they support.

Changes:

  • Add ParameterKind (flag enum) + Parameter.kind and enforce supported parameter kinds in KernelFactory.
  • Introduce parameter sub-selection via parameter_selector / parameter_names, and refactor Kernel.to_gpytorch to take a SearchSpace for automatic active_dims/ard_num_dims.
  • Add a deprecation guard that raises a DeprecationError when using a custom kernel in multi-task GP contexts unless suppressed via env var.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
baybe/kernels/base.py Refactors to_gpytorch to use SearchSpace-derived dimensions; adds parameter_names to basic kernels.
baybe/parameters/enum.py Introduces ParameterKind flag enum.
baybe/parameters/base.py Adds Parameter.kind property derived from ParameterKind.
baybe/parameters/__init__.py Exposes ParameterKind in public parameters API.
baybe/parameters/selector.py Adds parameter selector protocol + concrete selectors (e.g., TypeSelector).
baybe/surrogates/gaussian_process/components/generic.py Renames factory protocol type to GPComponentFactoryProtocol and updates conversion helper typing.
baybe/surrogates/gaussian_process/components/kernel.py Adds kernel factory base class with parameter-kind validation and introduces ICMKernelFactory.
baybe/surrogates/gaussian_process/components/mean.py Switches to protocol-based mean factory typing.
baybe/surrogates/gaussian_process/components/likelihood.py Switches to protocol-based likelihood factory typing.
baybe/surrogates/gaussian_process/components/__init__.py Exposes new *Protocol factory types.
baybe/surrogates/gaussian_process/presets/baybe.py Replaces alias with explicit default kernel/task-kernel factories (incl. multitask handling).
baybe/surrogates/gaussian_process/presets/edbo.py Updates EDBO kernel factory to use parameter_names selection; adjusts likelihood factory typing.
baybe/surrogates/gaussian_process/presets/edbo_smoothed.py Same as above for smoothed EDBO.
baybe/surrogates/gaussian_process/core.py Updates GP surrogate to use protocol factories, SearchSpace-based kernel conversion, and adds multitask custom-kernel deprecation guard.
baybe/settings.py Whitelists and validates BAYBE_DISABLE_CUSTOM_KERNEL_WARNING.
tests/test_kernels.py Updates kernel assembly test to build a SearchSpace and validate inferred dims / mapping.
tests/hypothesis_strategies/kernels.py Extends kernel strategies to optionally generate parameter_names.
tests/test_deprecations.py Adds deprecation test for multitask custom-kernel behavior and env-var suppression.
CHANGELOG.md Documents new features, breaking changes, and the new deprecation.
Comments suppressed due to low confidence (2)

baybe/surrogates/gaussian_process/presets/edbo.py:76

  • effective_dims is now train_x.shape[-1], which will include task columns in multi-task settings even though this factory can be used with parameter_selector to exclude TaskParameters. Since effective_dims controls prior/initialization regime selection, it should reflect the dimensionality of the kernel’s active (selected) inputs (e.g., based on self.get_parameter_names(searchspace) / BasicKernel._get_dimensions(searchspace)), otherwise priors will shift when adding a task parameter.
    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        switching_condition = _contains_encoding(
            searchspace.discrete, _EDBO_ENCODINGS
        ) and (effective_dims >= 50)

        # low D priors
        if effective_dims < 5:
            lengthscale_prior = GammaPrior(1.2, 1.1)

baybe/surrogates/gaussian_process/presets/edbo_smoothed.py:60

  • Same as in EDBOKernelFactory: effective_dims=train_x.shape[-1] will count task dimensions even when a parameter_selector excludes them. Since the interpolated priors depend on effective_dims, compute it from the selected/active dimensions instead of the raw train_x width to keep behavior stable in multi-task setups.
    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        # Interpolate prior moments linearly between low D and high D regime.
        # The high D regime itself is the average of the EDBO OHE and Mordred regime.
        # Values outside the dimension limits will get the border value assigned.
        lengthscale_prior = GammaPrior(
            np.interp(effective_dims, _DIM_LIMITS, [1.2, 2.5]),
            np.interp(effective_dims, _DIM_LIMITS, [1.1, 0.55]),
        )
        lengthscale_initial_value = np.interp(effective_dims, _DIM_LIMITS, [0.2, 6.0])
        outputscale_prior = GammaPrior(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Convert specified inner kernels to gpytorch, if provided
kernel_dict = {
key: value.to_gpytorch(**kw)
key: value.to_gpytorch(searchspace, **kw)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel.to_gpytorch recurses into inner kernels via value.to_gpytorch(searchspace, **kw), but the new to_gpytorch signature only accepts (searchspace) (no **kw). This will raise a TypeError for composite kernels at runtime. Call inner kernels with just the searchspace (or update the signature consistently) and let each kernel compute its own dimensions.

Suggested change
key: value.to_gpytorch(searchspace, **kw)
key: value.to_gpytorch(searchspace)

Copilot uses AI. Check for mistakes.
# Exception: initial values are not used during construction but are set
# on the created object (see code at the end of the method).
missing = set(unmatched) - set(kernel_attrs)
missing = set(unmatched) - set(kernel_attrs) - self._whitelisted_attributes
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sanity-check for unmatched attributes computes missing from unmatched (the last loop iteration) instead of the accumulated unmatched_attrs. This can silently miss unmatched BayBE attributes depending on base-class iteration order. Use the collected unmatched_attrs when computing missing.

Suggested change
missing = set(unmatched) - set(kernel_attrs) - self._whitelisted_attributes
missing = set(unmatched_attrs) - set(kernel_attrs) - self._whitelisted_attributes

Copilot uses AI. Check for mistakes.
Comment on lines +111 to 112
gpytorch_kernel = kernel_cls(**kernel_attrs, ard_num_dims=ard_num_dims, **kw)

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_gpytorch always passes ard_num_dims=ard_num_dims into the GPyTorch constructor, even when ard_num_dims is None. The comment above explicitly notes that passing None can break kernels that use kwargs.get('ard_num_dims', ...) semantics, and it also defeats the earlier filtering of default kwargs. Only pass ard_num_dims when it is not None (consistent with how active_dims is handled).

Suggested change
gpytorch_kernel = kernel_cls(**kernel_attrs, ard_num_dims=ard_num_dims, **kw)
# Only pass `ard_num_dims` if it is not None, to avoid overriding
# kernels that rely on `kwargs.get("ard_num_dims", ...)` semantics.
kernel_kwargs: dict[str, Any] = {**kernel_attrs, **kw}
if ard_num_dims is not None:
kernel_kwargs["ard_num_dims"] = ard_num_dims
gpytorch_kernel = kernel_cls(**kernel_kwargs)

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +149
base_kernel_factory: KernelFactoryProtocol = field(alias="base_kernel_or_factory")
"""The factory for the base kernel operating on numerical input features."""

task_kernel_factory: KernelFactoryProtocol = field(alias="task_kernel_or_factory")
"""The factory for the task kernel operating on the task indices."""

@base_kernel_factory.default
def _default_base_kernel_factory(self) -> KernelFactoryProtocol:
from baybe.surrogates.gaussian_process.presets.baybe import (
BayBENumericalKernelFactory,
)

return BayBENumericalKernelFactory(TypeSelector((TaskParameter,), exclude=True))

@task_kernel_factory.default
def _default_task_kernel_factory(self) -> KernelFactoryProtocol:
from baybe.surrogates.gaussian_process.presets.baybe import (
BayBETaskKernelFactory,
)

return BayBETaskKernelFactory(TypeSelector((TaskParameter,)))

@override
def _make(
self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
) -> Kernel:
base_kernel = self.base_kernel_factory(searchspace, train_x, train_y)
task_kernel = self.task_kernel_factory(searchspace, train_x, train_y)
return ProductKernel([base_kernel, task_kernel])
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICMKernelFactory fields are aliased as base_kernel_or_factory / task_kernel_or_factory, but they have no converter/validator like GaussianProcessSurrogate.kernel_factory. Passing a Kernel instance (or any non-callable component) will fail at runtime when invoked. Either (a) rename aliases to *_factory and validate is_callable(), or (b) add a to_component_factory converter here as well.

Related: _make builds a BayBE ProductKernel, so these sub-factories should be constrained to return BayBE Kernel objects (not raw GPyTorch kernels), otherwise ProductKernel([base_kernel, task_kernel]) will break.

Copilot uses AI. Check for mistakes.
from baybe.parameters.base import Parameter


class ParameterKind(Flag):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ParameterRole?

@AdrianSosic AdrianSosic mentioned this pull request Apr 2, 2026
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants