-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose surrogate #355
Expose surrogate #355
Conversation
@Scienfitz, @AVHopp: Here my proposal how to give convenient high-level access to the model internals through the campaign object. This makes it super easy for the user to apply model diagnostics. Below an example for SHAP. Notice how the only important piece of code is the import numpy as np
import shap
from baybe.campaign import Campaign
from baybe.parameters.numerical import NumericalContinuousParameter
from baybe.recommenders.pure.bayesian.botorch import BotorchRecommender
from baybe.searchspace.core import SearchSpace
from baybe.targets.numerical import NumericalTarget
def blackbox(x: np.ndarray) -> np.ndarray:
"""Quadratic function embedded into higher-dimensional space."""
assert x.shape[1] >= 2
return np.power(x[:, [0, 1]].sum(axis=1), 2)
N_PARAMETERS = 10
N_DATA = 100
# Campaign settings
parameters = [
NumericalContinuousParameter(f"p{i}", (-1, 1)) for i in range(N_PARAMETERS)
]
searchspace = SearchSpace.from_product(parameters)
objective = NumericalTarget("t", "MIN").to_objective()
campaign = Campaign(searchspace, objective, recommender=BotorchRecommender())
# Create measurements at random candidates
measurements = searchspace.continuous.sample_uniform(N_DATA)
measurements["t"] = blackbox(measurements.values)
campaign.add_measurements(measurements)
# Evaluate Shap values
df = campaign.measurements[[p.name for p in campaign.parameters]]
explainer = shap.Explainer(lambda x: campaign.get_surrogate().posterior(x).mean, df)
shap_values = explainer(df)
shap.plots.bar(shap_values) If we decide this is the way to go, then the next step could be to design a Also tagging @brandon-holt and @Alex6022 here, who expressed interest in the feature. |
e3d414e
to
6d2d2a8
Compare
d985ae6
to
55fd325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely short and unfinished first round of quick comments.
55fd325
to
29e16cb
Compare
97c800c
to
4120fbb
Compare
591381b
to
737c937
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New approach looks good to me
@AdrianSosic @Scienfitz @AVHopp Hi I am trying this approach for a campaign with SubstanceParameters and CustomDiscreteParameters and I can't get it to work with the shap analysis portion of the code # Evaluate Shap values
df = campaign.measurements[[p.name for p in campaign.parameters]]
explainer = shap.Explainer(lambda x: campaign.get_surrogate().posterior(x).mean, df)
shap_values = explainer(df)
shap.plots.bar(shap_values) If you run this as is with a campaign that has categorical parameters or any custom encoding, it fails with the error: I've tried two approaches
Approach 1 looks like this from baybe.parameters.substance import SubstanceParameter
from baybe.parameters.custom import CustomDiscreteParameter
df = campaign.measurements[[p.name for p in campaign.parameters]].copy()
original_df = df.copy()
def replace_with(df, lookup_df, replace_col):
# Dictionary to hold new columns
new_columns = {col: [None] * len(df) for col in lookup_df.columns}
# Replace values using lookup
for i, row in df.iterrows():
lookup_value = row[replace_col]
if lookup_value in lookup_df.index:
lookup_row = lookup_df.loc[lookup_value]
for col in lookup_df.columns:
new_columns[col][i] = lookup_row[col]
# Create a new DataFrame with the new columns
new_df = pd.DataFrame(new_columns, index=df.index)
# Concatenate the new columns to the original DataFrame
df = pd.concat([df, new_df], axis=1)
# Delete the column parameter.name
df.drop(replace_col, axis=1, inplace=True)
return df
for parameter in campaign.parameters:
if isinstance(parameter, SubstanceParameter):
df = replace_with(df, parameter.comp_df, parameter.name)
elif isinstance(parameter, CustomDiscreteParameter):
df = replace_with(df, parameter.data, parameter.name)
print(df)
explainer = shap.Explainer(lambda x: campaign.get_surrogate().posterior(x).mean, df, max_evals=5209)
shap_values = explainer(df)
shap.plots.bar(shap_values) And it fails because you'll be missing the columns you deleted. I tried setting allow_missing and allow_extra to true but that doesnt work either. Approach 2 looks like this from baybe.parameters.substance import SubstanceParameter
from baybe.parameters.custom import CustomDiscreteParameter
df = campaign.measurements[[p.name for p in campaign.parameters]].copy()
for parameter in campaign.parameters:
if isinstance(parameter, SubstanceParameter) or isinstance(parameter, CustomDiscreteParameter):
df[parameter.name] = df[parameter.name].astype('category')
df[parameter.name] = df[parameter.name].cat.codes
# convert df[parameter.name] to float64
df[parameter.name] = df[parameter.name].astype('float64')
if 'Labels' in parameter.comp_df.columns:
parameter.comp_df['Labels'] = parameter.comp_df['Labels'].astype('float64')
print(df)
explainer = shap.Explainer(lambda x: campaign.get_surrogate().posterior(x).mean, df)
shap_values = explainer(df)
shap.plots.bar(shap_values) And it fails with error: Can you please help me with a solution to generate shap analysis for campaigns with custom encodings or substanceparameters? Ideally this would give importance scores for each feature in the custom encodings, not just the categorical values themselves. UPDATE: I modified the code to work, but now I run into this error from baybe.parameters.substance import SubstanceParameter
from baybe.parameters.custom import CustomDiscreteParameter
global original_df
df = campaign.measurements[[p.name for p in campaign.parameters]].copy()
original_df = df.copy()
def replace_with(df, lookup_df, replace_col):
# Dictionary to hold new columns
new_columns = {col: [None] * len(df) for col in lookup_df.columns}
# Replace values using lookup
for i, row in df.iterrows():
lookup_value = row[replace_col]
if lookup_value in lookup_df.index:
lookup_row = lookup_df.loc[lookup_value]
for col in lookup_df.columns:
new_columns[col][i] = lookup_row[col]
# Create a new DataFrame with the new columns
new_df = pd.DataFrame(new_columns, index=df.index)
# Concatenate the new columns to the original DataFrame
df = pd.concat([df, new_df], axis=1)
# Delete the column parameter.name
df.drop(replace_col, axis=1, inplace=True)
return df
for parameter in campaign.parameters:
if isinstance(parameter, SubstanceParameter):
df = replace_with(df, parameter.comp_df, parameter.name)
elif isinstance(parameter, CustomDiscreteParameter):
df = replace_with(df, parameter.data, parameter.name)
# add a column to df to save the original index of the row
df['original_index'] = df.index
print(df)
def model(x):
global original_df
original_indices = x['original_index'].values
# build a new_df by going through each original_index and adding the corresponding row from original_df to new_df
new_df = pd.DataFrame(columns=original_df.columns)
rows = [original_df.loc[index] for index in original_indices]
new_df = pd.concat(rows, axis=1).T.reset_index(drop=True)
print(new_df)
return campaign.get_surrogate().posterior(new_df).mean
explainer = shap.Explainer(model, df, max_evals=5211)
shap_values = explainer(df)
shap.plots.bar(shap_values)
|
@brandon-holt Can you please post this as an issue and mention the corresponding PR there? It is easier for us if we have everything that requires our input in a single place, and discussing the issue is way easier there. Thanks :) |
Hi @AdrianSosic @Scienfitz @AVHopp , thank you for the great work! I typically work with multi-objective optimization use cases. Do you have an estimated timeline for when it will support multi-target mode? |
@zhensongds Thanks for your interest in BayBE :) Could you please ask this question in our "Issues" tab here on github? We'd prefer to have all of our interaction at a single place since there is the risk of us not spotting questions otherwise. Also, your question is then easier to also find for others who might also be interested in the answer. |
Hi @zhensongds. Yes, we'd appreciate if you could ask any further questions in form of issues to streamline communication. However, now that the question is already here: multi target optimization support is planned for 2024Q4, and exposing the corresponding surrogate models will happen along the way 👌 I'm currently on vacation til early October but will start working on it once I'm back. |
Thank you @AdrianSosic and @AVHopp . I’m excited for the upcoming updates! I’ll post in the issue if I have more questions from now on. |
Hi @brandon-holt. Just checked and think that this is not a problem with |
This PR enables convenient access to the surrogate model and posterior predictive distribution via the
Campaign
class.