Skip to content

Commit

Permalink
Refactor Surrogates (#338)
Browse files Browse the repository at this point in the history
Completes the surrogate factoring, which extended over #278, #309, #315,
#325, #337.

### Most important changes
* The transition point from experimental to computational representation
has been moved from the recommender to the surrogate. From an
architecture/responsibility perspective, this is reasonable since the
recommend should not have to bother about algorithmic/computational
details.
* The desired consequence is that public `Surrogate` methods like
`posterior` and `fit` can now operate on dataframes in experimental
representation, meaning they can also be exposed directly to the user.
* The new posterior methods now all return a general `Posterior` object
instead of implicitly assuming Gaussian distributions. This paves the
way for arbitrary surrogate extensions, such as Bernoulli/Categorical
surrogates, etc. At the moment, this introduces an explicit coupling to
botorch, which is fine because botorch remains a core dependency and the
only backend used for complex surrogate modeling. In the future, this
can be further abstracted by introducing our own `Posterior` class.
* The `Surrogate` layout has been refined such that the extracted
`SurrogateProtocol`, which now defines the formal interface for all
surrogates, imposes minimal requirements to the user.
* Scaling has been completely redesigned, offering the possibility to
configure input/output scaling down to the level of individual
parameters and targets. The configuration is currently class-specific,
but can be extended to allow surrogate instance specific rules in the
future.
  • Loading branch information
AdrianSosic authored Aug 29, 2024
2 parents e49dc3d + 3768774 commit e0adbf8
Show file tree
Hide file tree
Showing 45 changed files with 906 additions and 1,129 deletions.
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,40 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Breaking Changes
- The public methods of `Surrogate` models now operate on dataframes in experimental
representation instead of tensors in computational representation
- `Surrogate.posterior` models now returns a `Posterior` object
- `param_bounds_comp` of `SearchSpace`, `SubspaceDiscrete` and `SubspaceContinuous` has
been replaced with `comp_rep_bounds`, which returns a dataframe

### Added
- `py.typed` file to enable the use of type checkers on the user side
- `GaussianSurrogate` base class for surrogate models with Gaussian posteriors
- `comp_rep_columns` property for `Parameter`, `SearchSpace`, `SubspaceDiscrete`
and `SubspaceContinuous` classes
- New mechanisms for surrogate input/output scaling configurable per class
- `SurrogateProtocol` as an interface for user-defined surrogate architectures

### Changed
- The transition from experimental to computational representation no longer happens
in the recommender but in the surrogate
- Fallback models created by `catch_constant_targets` are stored outside the surrogate
- `to_tensor` now also handles `numpy` arrays

### Fixed
- `CategoricalParameter` and `TaskParameter` no longer incorrectly coerce a single
string input to categories/tasks
- `farthest_point_sampling` no longer depends on the provided point order

### Removed
- `register_custom_architecture` decorator
- `Scalar` and `DefaultScaler` classes

### Deprecations
- The role of `register_custom_architecture` has been taken over by
`baybe.surrogates.base.SurrogateProtocol`

## [0.10.0] - 2024-08-02
### Breaking Changes
- Providing an explicit `batch_size` is now mandatory when asking for recommendations
Expand Down
2 changes: 2 additions & 0 deletions baybe/acquisition/acqfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ def get_integration_points(self, searchspace: SearchSpace) -> pd.DataFrame:
ValueError: If the search space is purely continuous and
'sampling_n_points' was not provided.
"""
# TODO: Move the core logic to `SearchSpace` and ``Subspace*`` classes

sampled_parts: list[pd.DataFrame] = []
n_candidates: int | None = None

Expand Down
21 changes: 15 additions & 6 deletions baybe/acquisition/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@
import pandas as pd
from attrs import define

from baybe.searchspace import SearchSpace
from baybe.objectives.base import Objective
from baybe.searchspace.core import SearchSpace
from baybe.serialization.core import (
converter,
get_base_structure_hook,
unstructure_base,
)
from baybe.serialization.mixin import SerialMixin
from baybe.surrogates.base import Surrogate
from baybe.surrogates.base import SurrogateProtocol
from baybe.utils.basic import classproperty, match_attributes
from baybe.utils.boolean import is_abstract
from baybe.utils.dataframe import to_tensor
Expand All @@ -42,14 +43,22 @@ def _non_botorch_attrs(cls) -> tuple[str, ...]:

def to_botorch(
self,
surrogate: Surrogate,
surrogate: SurrogateProtocol,
searchspace: SearchSpace,
train_x: pd.DataFrame,
train_y: pd.DataFrame,
objective: Objective,
measurements: pd.DataFrame,
):
"""Create the botorch-ready representation of the function."""
"""Create the botorch-ready representation of the function.
The required structure of `measurements` is specified in
:meth:`baybe.recommenders.base.RecommenderProtocol.recommend`.
"""
import botorch.acquisition as botorch_acqf_module

# Get computational data representations
train_x = searchspace.transform(measurements, allow_extra=True)
train_y = objective.transform(measurements)

# Retrieve corresponding botorch class
acqf_cls = getattr(botorch_acqf_module, self.__class__.__name__)

Expand Down
4 changes: 4 additions & 0 deletions baybe/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,9 @@ class UnidentifiedSubclassError(Exception):
"""A specified subclass cannot be found in the given class hierarchy."""


class ModelNotTrainedError(Exception):
"""A prediction/transformation is attempted before the model has been trained."""


class UnmatchedAttributeError(Exception):
"""An attribute cannot be matched against a certain callable signature."""
19 changes: 15 additions & 4 deletions baybe/parameters/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ def is_in_range(self, item: Any) -> bool:
``True`` if the item is within the parameter range, ``False`` otherwise.
"""

@abstractmethod
def summary(self) -> dict:
"""Return a custom summarization of the parameter."""

def __str__(self) -> str:
return str(self.summary())

Expand All @@ -72,12 +68,21 @@ def is_discrete(self) -> bool:
"""Boolean indicating if this is a discrete parameter."""
return isinstance(self, DiscreteParameter)

@property
@abstractmethod
def comp_rep_columns(self) -> tuple[str, ...]:
"""The columns spanning the computational representation."""

def to_searchspace(self) -> SearchSpace:
"""Create a one-dimensional search space from the parameter."""
from baybe.searchspace.core import SearchSpace

return SearchSpace.from_parameter(self)

@abstractmethod
def summary(self) -> dict:
"""Return a custom summarization of the parameter."""


@define(frozen=True, slots=False)
class DiscreteParameter(Parameter, ABC):
Expand All @@ -97,8 +102,14 @@ def values(self) -> tuple:
@cached_property
@abstractmethod
def comp_df(self) -> pd.DataFrame:
# TODO: Should be renamed to `comp_rep`
"""Return the computational representation of the parameter."""

@property
def comp_rep_columns(self) -> tuple[str, ...]: # noqa: D102
# See base class.
return tuple(self.comp_df.columns)

def to_subspace(self) -> SubspaceDiscrete:
"""Create a one-dimensional search space from the parameter."""
from baybe.searchspace.discrete import SubspaceDiscrete
Expand Down
5 changes: 5 additions & 0 deletions baybe/parameters/numerical.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ def is_in_range(self, item: float) -> bool: # noqa: D102

return self.bounds.contains(item)

@property
def comp_rep_columns(self) -> tuple[str, ...]: # noqa: D102
# See base class.
return (self.name,)

def summary(self) -> dict: # noqa: D102
# See base class.
param_dict = dict(
Expand Down
4 changes: 2 additions & 2 deletions baybe/recommenders/naive.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def recommend( # noqa: D102
# Get discrete candidates. The metadata flags are ignored since the search space
# is hybrid
# TODO Slight BOILERPLATE CODE, see recommender.py, ll. 47+
_, candidates_comp = searchspace.discrete.get_candidates(
candidates_exp, _ = searchspace.discrete.get_candidates(
allow_repeated_recommendations=True,
allow_recommending_already_measured=True,
)
Expand All @@ -147,7 +147,7 @@ def recommend( # noqa: D102
# Call the private function of the discrete recommender and get the indices
disc_rec_idx = self.disc_recommender._recommend_discrete(
subspace_discrete=searchspace.discrete,
candidates_comp=candidates_comp,
candidates_exp=candidates_exp,
batch_size=batch_size,
)

Expand Down
22 changes: 11 additions & 11 deletions baybe/recommenders/pure/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ def recommend( # noqa: D102
def _recommend_discrete(
self,
subspace_discrete: SubspaceDiscrete,
candidates_comp: pd.DataFrame,
candidates_exp: pd.DataFrame,
batch_size: int,
) -> pd.Index:
"""Generate recommendations from a discrete search space.
Args:
subspace_discrete: The discrete subspace from which to generate
recommendations.
candidates_comp: The computational representation of all discrete candidate
candidates_exp: The experimental representation of all discrete candidate
points to be considered.
batch_size: The size of the recommendation batch.
Expand All @@ -67,14 +67,14 @@ def _recommend_discrete(
Returns:
The dataframe indices of the recommended points in the provided
computational representation.
experimental representation.
"""
# If this method is not implemented by a child class, try to resort to hybrid
# recommendation (with an empty subspace) instead.
try:
return self._recommend_hybrid(
searchspace=SearchSpace(discrete=subspace_discrete),
candidates_comp=candidates_comp,
candidates_exp=candidates_exp,
batch_size=batch_size,
).index
except NotImplementedError as exc:
Expand Down Expand Up @@ -110,7 +110,7 @@ def _recommend_continuous(
try:
return self._recommend_hybrid(
searchspace=SearchSpace(continuous=subspace_continuous),
candidates_comp=pd.DataFrame(),
candidates_exp=pd.DataFrame(),
batch_size=batch_size,
)
except NotImplementedError as exc:
Expand All @@ -126,7 +126,7 @@ def _recommend_continuous(
def _recommend_hybrid(
self,
searchspace: SearchSpace,
candidates_comp: pd.DataFrame,
candidates_exp: pd.DataFrame,
batch_size: int,
) -> pd.DataFrame:
"""Generate recommendations from a hybrid search space.
Expand All @@ -138,7 +138,7 @@ def _recommend_hybrid(
Args:
searchspace: The hybrid search space from which to generate
recommendations.
candidates_comp: The computational representation of all discrete candidate
candidates_exp: The experimental representation of all discrete candidate
points to be considered.
batch_size: The size of the recommendation batch.
Expand Down Expand Up @@ -175,7 +175,7 @@ def _recommend_with_discrete_parts(

# Get discrete candidates
# Repeated recommendations are always allowed for hybrid spaces
_, candidates_comp = searchspace.discrete.get_candidates(
candidates_exp, _ = searchspace.discrete.get_candidates(
allow_repeated_recommendations=is_hybrid_space
or self.allow_repeated_recommendations,
allow_recommending_already_measured=is_hybrid_space
Expand All @@ -184,7 +184,7 @@ def _recommend_with_discrete_parts(

# Check if enough candidates are left
# TODO [15917]: This check is not perfectly correct.
if (not is_hybrid_space) and (len(candidates_comp) < batch_size):
if (not is_hybrid_space) and (len(candidates_exp) < batch_size):
raise NotEnoughPointsLeftError(
f"Using the current settings, there are fewer than {batch_size} "
"possible data points left to recommend. This can be "
Expand All @@ -196,11 +196,11 @@ def _recommend_with_discrete_parts(

# Get recommendations
if is_hybrid_space:
rec = self._recommend_hybrid(searchspace, candidates_comp, batch_size)
rec = self._recommend_hybrid(searchspace, candidates_exp, batch_size)
idxs = rec.index
else:
idxs = self._recommend_discrete(
searchspace.discrete, candidates_comp, batch_size
searchspace.discrete, candidates_exp, batch_size
)
rec = searchspace.discrete.exp_rep.loc[idxs, :]

Expand Down
14 changes: 4 additions & 10 deletions baybe/recommenders/pure/bayesian/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,14 @@
from baybe.recommenders.pure.base import PureRecommender
from baybe.searchspace import SearchSpace
from baybe.surrogates import CustomONNXSurrogate, GaussianProcessSurrogate
from baybe.surrogates.base import Surrogate
from baybe.utils.dataframe import to_tensor
from baybe.surrogates.base import SurrogateProtocol


@define
class BayesianRecommender(PureRecommender, ABC):
"""An abstract class for Bayesian Recommenders."""

surrogate_model: Surrogate = field(factory=GaussianProcessSurrogate)
surrogate_model: SurrogateProtocol = field(factory=GaussianProcessSurrogate)
"""The used surrogate model."""

acquisition_function: AcquisitionFunction = field(
Expand Down Expand Up @@ -51,14 +50,9 @@ def _setup_botorch_acqf(
measurements: pd.DataFrame,
) -> None:
"""Create the acquisition function for the current training data.""" # noqa: E501
# TODO: Transition point from dataframe to tensor needs to be refactored.
# Currently, surrogate models operate with tensors, while acquisition
# functions with dataframes.
train_x = searchspace.transform(measurements, allow_extra=True)
train_y = objective.transform(measurements)
self.surrogate_model._fit(searchspace, *to_tensor(train_x, train_y))
self.surrogate_model.fit(searchspace, objective, measurements)
self._botorch_acqf = self.acquisition_function.to_botorch(
self.surrogate_model, searchspace, train_x, train_y
self.surrogate_model, searchspace, objective, measurements
)

def recommend( # noqa: D102
Expand Down
Loading

0 comments on commit e0adbf8

Please sign in to comment.