-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surrogate models #78
Comments
I could imagine the metamodel to be a forward model and then only pass the metamodel to the inference problem. I would also suggest to have three steps to create a metamodel
and then use the metamodel as a "normal" forward model
|
I think this is a good structure. However, I'm not sure if we want to have the training of the surrogate model in the definition part of the problem. In your proposal, one would have to wait (possibly a long time) until the surrogate model is ready to be added, and only then the definition of the problem could continue. Maybe it would be better to do the surrogate-training in the solver routine after the problem is fully defined. |
I think this would not make a difference, because the python engine would then wait that same time in the solver. In the longterm, I would probably even decompose that into two workflow steps (e.g. pydoit or nextflow), such that the result of the training of the metamodel (the outputs) can be stored and is only recomputed if needed. |
That's true, it wouldn't make a difference in terms of computation time. And also the pydoit approach makes sense. However, I would find the problem definition structure cleaner, if no computations happen in the definition phase of the problem. All of the heavy lifting would happen after the problem is defined (and checks have made sure, that the problem definition makes sense). Maybe the surrogate model could have a flag that indicates, whether it was already trained or not. This would be checked by the solver, and if no training was done yet (i.e., no respective training-files are found) it would run the training before starting the inference step. |
Yes, that could also be done (so the train is internally called if it has not been trained before). The only think that is important for me is that we store the metamodel as a standard forward model in the parameter estimation problem - not both in parallel (though the metamodel should actually have the exact forward model stored, but that should not be used within the inference problem). This would make the implementation easier and less coupled (because the metamodel is just another forward model and can be developed and tested independently of the parameter estimation problem). |
Sure thing, I will update the code accordingly. |
I think you talked about adaptive training (adaptively querry the forward model to adhere to the fixed computational budget). In that scenario, this would be difficult.
Surrogating just needs forward model with input and output, some initial input values (maybe samples from the prior) and the bounds of the inputs. The training is performed externally (nothing to do with Probeye based interface). Once trained, it add be added to inference problem. If the training of surrogate is done in probeye, it will be complicate things IMO. |
That is right, but those are all given already in probeye. So creating a metamodel e.g. using a GP based on a LHS of the prior and a computation of the corresponding forward models would mean a single line in the code. Sure, you could create your own metamodel outside and use it here, but that would be much more code to be added. At least for some standard cases, this would be nice to directly be incorporated here. |
As for the adaptive case, the metamodel would still be able to call additional forward model function evaluations, thus even an adaptive metamodel would work (since the metamodel has the forward model stored). |
Adding to @atulag0711's comment regarding adaptive sampling, it may also be convenient to consider a separation of the metamodel (e.g. GP, NN, etc.) and sampling approach (LHS, active/adaptive sampling etc.), since these can be combined arbitrarily depending on the problem at hand. A modified version of the code snippet that can deal with that case (I am not sure what a better term would be for the combined metamodel + sampler): my_surrogate = Surrogate(orig_forward_model, surrogate_kwargs)
my_metamodel = Sampler(my_surrogate, sampler_kwargs)
problem.add_surrogate_model(my_metamodel )
my_metamodel.train(training_parameters) Some notes on this approach:
|
Separaring the sampler on the script level is a good idea. I think it would probably make sense, to actually include at least some basic samplers in the ParameterEstimation problem, since e.g. the prior distributions etc is all given (e.g. a method in ParameterEstimation sample_LHS_from_prior(num_samples=100). And this function should then return a format (dict) that we could directly use in the forward model (so essentially returning an array[num_samples] of dicts with all the parameters. And as mentioned above, I would not add the surrogate as an additional feature, but rather as a standard forward model. That said, we could still have a MetaModel base class (derived from ForwardModel) that implements these fit, function, but predict would IMO be the function already implemented in ForwardModel (evaluate). What would be the metric? |
It would be great, if probeye had the possibility to directly define and use surrogate models to reduce the computing time when working with computationally expensive forward models.
The text was updated successfully, but these errors were encountered: