New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

surrogate model #77

Draft

aklawonn wants to merge 18 commits into main from surrogate_model

Collaborator

aklawonn commented Apr 27, 2022 •

edited

Loading

Implements the code structure for using surrogate models (meta models, response surfaces) with probeye. This structure is up for discussion, so please feel free to comment and propose changes. In order to check the currently intended use of a surrogate model, check out this integration test. When finished, this fixes #78.


          added first basic structure for surrogate models

edc92c3

aklawonn marked this pull request as draft

April 27, 2022 08:39

aklawonn requested a review from joergfunger

April 27, 2022 08:39

aklawonn added the enhancement label

aklawonn requested a review from atulag0711

April 27, 2022 11:37

joergfunger reviewed

View reviewed changes

probeye/definition/inverse_problem.py Outdated

@@ @@ -211,6 +219,11 @@ def forward_models(self) -> dict: @@
                       """Access self._forward_models from outside via self.forward_models."""
                       return self._forward_models
+                  @property
+                  def surrogate_models(self) -> dict:

Member

joergfunger Apr 27, 2022

why does the surrogate model has to be stored at the inference problem? An alternative would be

my_metamodel = gp_forwardmodel(orig_forward_model, additional_parameters_of_gp)

and then the user can decide on either using the metamodel

problem.add_forward_model(my_metamodel)

or the original forward model

problem.add_forward_model(orig_forward_model)

Collaborator

atulag0711 commented May 3, 2022

Thanks Alex. The integration example looks nice and makes sense to me. However, fit method in probeye context would be difficult. I will complete my implementation, push here and then we can discuss.

merijnpepijndebakker commented May 16, 2022

Thanks for this. In my opinion the structure you propose makes a lot of sense, i.e. consider the surrogate model as a separate class and as model object. One remark is that we also have different ways to sample new datapoints to generate the surrogate. How would you connect the surrogate with the sampling methods?

Member

joergfunger commented May 16, 2022 •

edited

Loading

I would think the sampling could be done separately. So we would either have a sample_from_prior functionality, or any other sampler that takes as input a parameter estimation problem and returns the samples. Then these samples would be passed to the metamodel, which would then evalute the samples and build the metamodel. You could certainly perform the computation of the outputs (so the forward model output that is to be metamodeled) as a separate independent step, however doing it inside the metamodel allows for an adaptation (so if the metamodel requires new samples, they could just be added whenever the model is evaluated).

aklawonn and others added 16 commits

May 17, 2022 15:11


          added latin hypercube sampler class

78b0dc8


          removed compatability with Python 3.6

b0c0653


          had to do more changes to remove the Python 3.6 flag


          removed add_surrogate_model; some other small changes

c88e33e


          fixed bug in test_surrogate_model.py

650f621


          Merge branch 'main' into surrogate_model

5258ed4


          added a missing import statement in inverse_problem.py

e899df0


          added example of inference with surrogate of hartman 6D function

3ea64be


          added surrogating example and temporarily disabled mypy from pre-comm…

72eb782

…it hooks


          resolved merge conflict

68982cc


          draft example of combining probeye with harlow for surrogating and sa…

6226ca0

…mpling


          worked on connecting probeye and harlow

24321af


          added .dat and .owl files to gitignore

fed4207


          temporary commit to add .dat and .owl files to gitignore

6a7dd74


          resolved merge conflicts


          implemented HarlowModelFactory class for generating surrogate forward…

b6c5816

… models

joergfunger requested changes

View reviewed changes

Member

joergfunger left a comment

Thanks for the integration. Can we somehow decouple the specific implementations from harlow from the general surrogate interface and put the specific implementations for harlow into a surrogate subdirectory? This way other metamodels could be integrated as well.

probeye/definition/surrogate_model.py Outdated

               from probeye.definition.forward_model import ForwardModelBase
+              # Harlow imports
+              from harlow.sampling import Sampler

Member

joergfunger Sep 5, 2022

Should that be in a very general definition of the interface (rather than in a specific implementation using harlow).

Collaborator

JanKoune Sep 16, 2022

Done, I have separated the general definition of the interface (SamplerBase) from the particular implementation for harlow (HarlowSampler(SamplerBase)).

probeye/definition/surrogate_model.py Outdated

+                  def __init__(
+                      self,
+                      problem: InverseProblem,

Member

joergfunger Sep 5, 2022

We do we need that complex interface? In particular, why do we need the inverseProblem? What about the sampler, shouldn't the samples directly be passed?

probeye/definition/surrogate_model.py Outdated

+                      problem: InverseProblem,
+                      forward_model: ForwardModelBase,
+                      sampler: Sampler,
+                      surrogate_model: Surrogate,

Member

joergfunger Sep 5, 2022

This is also Harlow specific? How would we integrate different surrogate models (apart from Harlow)?

probeye/definition/surrogate_model.py Outdated

+                      # TODO: Implement check that the expensive forward model
+                      #  has been added to the problem
+                      self.input_sensor_data = dict()
+                      for exp_name, exp in self.problem.experiments.items():

Member

joergfunger Sep 5, 2022

Why isn't all this information stored in the forward model?

probeye/definition/surrogate_model.py Outdated


		"""

		self.surrogate.fit(

Member

joergfunger Sep 5, 2022

Why do we have to store all the fit points and then call the fit routine with these internal variables instead of providing a fit method that requires the input of sampling/training points.

probeye/definition/surrogate_model.py Outdated

+                          self.sampler.fit_points_x, self.sampler.fit_points_y, **kwargs
+                      )
+                  def sample(self, **kwargs):

Member

joergfunger Sep 5, 2022

Could this be done outside of the surrogate? What is the sampler part of the surrogate?


          refactored sampling and surrogating interface and adjusted example

3960eef

joergfunger requested changes

View reviewed changes

Member

joergfunger left a comment

I did not review the complete code, but I realized one main design decision that I would like to dicuss. You are surrogating the inference problem (with all parameters), but shouldn't we surrogate each forward model and only use the inference problem to define bounds/distributions of the parameters?

examples/example_surrogate_hartman6D.py Show resolved Hide resolved

examples/example_surrogate_hartman6D.py

+              # An iterative sampler. Here we pass the surrogate ForwardModel directly to the sampler. However, it is
+              # also possible to pass a surrogate model that will be included in a forward model after fitting.
+              harlow_sampler = HarlowSampler(problem, forward_model, LatinHypercube, surrogate_model)

Member

joergfunger Sep 16, 2022

what is LatinHypercube here? could you document that somehow?

Collaborator

JanKoune Sep 16, 2022

It is the harlow implementation of LHS. I will try to improve the documentation and fix the docstrings asap.

examples/example_surrogate_hartman6D.py Show resolved Hide resolved

probeye/definition/inverse_problem.py

                           value=new_value
                       )
+                  def get_latent_prior_hyperparameters(self, prm_name: str) -> list:

Member

joergfunger Sep 16, 2022

The problem definition in probeye allows a hierarchical definition, with parameter priors having hyperparameters that itself could have again parameters, that again could have hyperparameters (though I don't think that is often used with more than a single level of hierarchy). So in the case of two levels of hierarchy, what would that return?
In addition, what is the difference to the existing methods here So why do you need for individual parameters the hyper parameters?

probeye/definition/inverse_problem.py

+                      Returns
+                      -------
+                      prms
+                          Contains <local parameter name> : <(global) parameter value> pairs. If a

Member

joergfunger Sep 16, 2022

not sure of I understand that, there is a local and global name, but the value is the same (so neither local or global)

probeye/metamodeling/initial_sampling.py

+                      # make sure that all parameters are one-dimensional; it is not straight forward
+                      # how to do LHS for general multivariate parameters
+                      for prm_name in self.problem.latent_prms:

Member

joergfunger Sep 16, 2022

Should we loop over the latent_prms of the inverse problem, or rather of the original forward model? In case of a single forward model, that is identical, but in case of multiple models, I don't think we should surrogate all models in parallel but rather split that into surrogates for each model.

probeye/metamodeling/initial_sampling.py

+                      The considered inverse problem.
+                  """
+                  def __init__(self, problem: InverseProblem):

Member

joergfunger Sep 16, 2022

The design question if we surrogate the inverse problem, or each individual forward model. I would opt for the latter. The inverse problem will only provide in a somehow automatic way the bounds.

Collaborator

JanKoune Sep 16, 2022

The newer version of LatinHypercubeSampler in sampling.py (which I simply copied from this implementation by Alex) is derived from SamplerBase. We could ensure that the sampling and surrogating is done for each individual forward model by specifying the forward model as an input argument in SamplerBase.

E.g. this part:

class SamplerBase:
    """
    Base class for probeye samplers
    """

    def __init__(self, problem: InverseProblem):

would become:

class SamplerBase:
    """
    Base class for probeye samplers
    """

    def __init__(self, problem: InverseProblem, forward_model: ForwardModelBase):

Member

joergfunger Sep 19, 2022

I see the challenge, however it is also tricky to hide that. What do we do in case of a parametric prior (where e.g. the std. dev is again a parameter).

Collaborator

JanKoune commented Sep 16, 2022 •

edited

Loading

Hi, thank you for the feedback. It may be best to provide a brief description of the changes and the intended layout of the sampling and surrogating interface before addressing your individual comments:

Surrogating and sampling are now separated by defining the SurrogateBase and SamplerBase classes, from which the user is expected to derive their own implementation.
Both base classes should be as simple as possible for flexibility to allow for different approaches to surrogating and sampling. E.g. in a simple case we may want to draw LHS samples from the prior and manually fit a surrogate model which we then use to define a forward model, or in a more complex case we may want to pass the surrogate model to the sampler to perform iterative fitting and sampling.
The main use of the SurrogateBase class is to copy the interface of a given forward model, and provide a template for surrogate models by implementing "empty" methods that are typically useful for surrogating (e.g. fit and predict).
The SamplerBase is meant to read the inverse problem and forward model definitions and extract the information that is needed by the sampler (e.g. parameter priors and bounds, and input/output shapes). Indeed it would be best to do the surrogating for each forward model and not for the entire inverse problem. This could be enforced by making the forward model an input argument to SamplerBase, ensuring that a sampler is tied to a specific forward model.
Most of the functionality for getting the problem and forward model information is not yet implemented in the base class. I will try to get it working asap.
The base classes are now decoupled from harlow.

danielandresarcones requested changes

View reviewed changes

Collaborator

danielandresarcones left a comment

I would suggest to develop the interface in such a way that separates completely the sampler and surrogate from the inverse and forward problem. The separated base classes work for that, but still require knowledge of the requirements of the ForwardModel. I woudl suggest something like this, where the SurrogateAndSampler interface controls the flow of information with the inverse problem and between sampler and surrogate, while acting as the ForwardModel. This would allow to add the necessary utilities in the interface instead of having to redefine them for every surrogate. More complex interactions such as adaptative samplers can be implemented as children. The data format between the interface and the individual wrappers should be decided beforehand.

Additionally, the wrappers are to be implemented individually, allowing the user to adapt the inputs to their specific solver. If, for example, a surrogate needs information of a specific forward model, that can be requested in the wrapper and introduced in the initialization, encapsulating each of the implementations individually.

probeye/metamodeling/sampling.py

    
              from harlow.surrogating import Surrogate

              # external imports

              from harlow.sampling import Sampler as HarlowSamplerBase

              from harlow.surrogating import Surrogate as HarlowSurrogateBase

              from harlow.utils.helper_functions import latin_hypercube_sampling

Collaborator

danielandresarcones Sep 16, 2022

I am not very fond of having external imports in the same script where the base class is defined. I would personally keep the base class separated from the derived ones, creating one new script for each group of samplers.

Collaborator

JanKoune Sep 16, 2022

This is done for convenience initially, and it would be trivial to separate the specific implementations from the base class if necessary later.

probeye/metamodeling/sampling.py

+                      self.priors = copy.deepcopy(self.problem.priors)
+                      for prior_template in self.problem.priors.values():
+                          prm_name = prior_template.ref_prm
+                          self.priors[prm_name] = translate_prior(prior_template)

Collaborator

danielandresarcones Sep 16, 2022

I don't think the sampler should have to treat the priors itself, probably better in the interface.

Member

joergfunger Sep 19, 2022

I think the challenge with the implementation is that we strictly separated between the problem definition and compute functions (such as evaluating the prior with e.g. a scipy pdf evaluation, or a pytorch pdf evaluation), i.e. the problem definition cannot perform a sampling from the prior a priori. Thus, we would somehow have to create a solver just for sampling the prior using e.g. LHS. Not sure what exactly would be the recommended option, I see two options

The sampler just get's the distributions and performs the (initial) sampling internally. This would mean to extract the distributions with all the relevant parameters from the inference problem.
We create a dummy sampler that is able to sample from the prior and extracts that samples as a first set of points to the surrogate.
In both situations, the intervals for the variables have to be transferred to the surrogate sampling either by having a check_bounds in the inference problem, or by "copying" or extracting these information from inference problem.

probeye/metamodeling/sampling.py

    
                      sampler: Sampler,

                      surrogate_model: Surrogate,

                      sampler: HarlowSamplerBase,

                      surrogate_model: HarlowSurrogateBase,

Collaborator

danielandresarcones Sep 16, 2022

Ideally a harlow sampler should be usable with a non-harlow surrogate, but it is not the case. Probably more a "harlow" thing than a "probeye" one.

Collaborator

JanKoune Sep 16, 2022

Harlow offers an abstract surrogate class which can be used to define new surrogates that can be used with the harlow samplers. Since most samplers need access to the surrogate (for fitting and making predictions iteratively), I don't think its feasible to make them work with any surrogate.

Member

joergfunger Sep 19, 2022

Why exactly should that not work with other surrogates? But I somehow agree that the sampler and the surrogate are connected and maybe we should actually follow a suggestions from Daniel to put that into a same base class.

probeye/metamodeling/surrogating.py

		raise NotImplementedError


		class HarlowSurrogate(SurrogateModelBase):

Collaborator

danielandresarcones Sep 16, 2022

I would separate this from the base class.

probeye/metamodeling/surrogating.py

+                  """
+                  def __init__(
+                      self, name: str, surrogate: HarlowSurrogateBase, forward_model: ForwardModelBase

Collaborator

danielandresarcones Sep 16, 2022

The base class should be general instead of forcing to input a harlow surrogate.
From my point of view, this base class should just coordinate the input and output of data to the surrogate model as well as its format.

Collaborator

JanKoune Sep 16, 2022 •

edited

Loading

Indeed, thanks for catching this! I will fix this in the next update.

Collaborator

JanKoune commented Sep 16, 2022 •

edited

Loading

Thank you for the suggestions @danielandresarcones. Some notes:

What is the benefit of creating surrogating and sampling wrapper base classes plus an interface class? Why not use the __init__ method of SamplerBase to obtain all the information needed for surrogating and sampling from InverseProblem and ForwardModel?
Ideally, with the current structure of SamplerBase and SurrogateBase the user would not need any knowledge of the InverseProblem and ForwardModel internals since extracting the necessary information would be taken care of in the __init__ of SamplerBase, although this is not yet implemented.
I like the idea of having the interface also act as the forward model. However, it seems that there is a lot of processing of the forward model that happens internally in probeye and could potentially cause problems (e.g. the issue with deepcopy) if the forward model becomes too complex. It could also be that this is fine, but I tried this approach initially and encountered some issues which is why I opted to keep SurrogateBase as light as possible.

Collaborator

danielandresarcones commented Sep 16, 2022

My idea behind it is to separate functions and allow an easier integration with external generic samplers/surrogates, unless we decide to go always through harlow. Using the __init__ from SamplerBase means that the interactions between surrogate and sampler are controled by the sampler, and that this one will have to store the information regarding datasets, formats and parameters, as well as implementing the extra utilities that we may need. If we want to use a surrogate with a previously generated dataset with no sampling at all, having to define a sampler seems counterintuitive. In the same way, derived samplers shouldn't be able to modify the utilities concerning the ForwardModel, such as comparing the output of the surrogate model or generating the covariance matrices.
The derived class would not have to modify it, but it will still be passed through the __init__ and accessible, as well as having to generate a copy of it. I consider encapsulating the sampler and surrogate from the inverse and forward problems the safest option.
That is a good point, although it is something to consider.

Collaborator

JanKoune commented Sep 16, 2022

I don't see anything in the current implementation tying us to Harlow or necessitating the use of a sampler to fit a surrogate model to an existing dataset. It seems to me that the two approaches are mostly functionally equivalent, with the main differences being:

Having a separate interface class vs. having SamplerBase act as the interface
Defining the forward model as part of the interface vs. defining it as part of the surrogate model

I do not have a strong preference for either approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

JanKoune JanKoune left review comments

joergfunger joergfunger requested changes

danielandresarcones danielandresarcones requested changes

atulag0711 Awaiting requested review from atulag0711

Requested changes must be addressed to merge this pull request.

Labels