-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move allow_*
flags to Campaign
#423
base: main
Are you sure you want to change the base?
Conversation
2ccb684
to
f79b557
Compare
0987d10
to
99d0428
Compare
491c79d
to
88ab485
Compare
f"purely discrete spaces and " | ||
f"{fields(self.__class__).allow_recommending_pending_experiments.name}" | ||
f"=False.", | ||
f"'{self.__class__.__name__}' does not use this information.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scienfitz: I remember us having a longer discussion about this part in one of the previous PRs. With the new changes, I think the semantics are now considerable easier but we probably still need to iterate once again to align what the desired behavior should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait now the warning is printed whenever pending experiments is provided independent of the flag, that can be annoying
eg what happens if I pass pending experiments because I want to exclude them (flag on True), I'd always get this warning now or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your conclusion is correct but I think your expectation is wrong. That's the point I tried to convey in today's meeting but didn't succeed :D Let's go trough the cases:
Recommender Level
Here, the flags don't exist, and we need to disentangle the two concerns of
- informing the algorithm about pending points. This is done by passing
pending_experiments
torecommend
- Excluding pending points from the candidate set. This is the
filter
part that is still missing. As I said, controlling what is recommendable on the search space level now always goes via filtering, i.e. creating a new object, since there is no more internal state we could control. So in the future, we can flexibly control this (in general, not just for pending points!) viarecommender.recommend(searchspace=searchspace.filter(...))
If you consider the above, then the conclusion is that RandomRecommender(pending_experiments=df)
should always raise the error, because the algorithm can't handle the info and excluding experiments would go via the filtering route instead.
Campaign Level
Same separation of concerns:
- Informing about pending points is done in the same way, by passing
pending_experiments
torecommend
. - Excluding pending points.
The second part is probably where the misunderstanding is: Excluding the pending points by passing pending_experiments
+ setting the flag to False
is not just excluding the pending points
but also informing the algorithm
, so you'd be mixing both concerns. And in the non-predictive recommender context, one of the two concerns doesn't apply / make sense. So the warning would be justified. If you really only wanted to exclude
, you'd go via toggling, which would be fine. But really the scenario is a bit exotic anyway because if you have pending points, you wouldn't want to use a nonpredictive recommender in the first place. And this is exactly what the warning tells you!
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get your points here right, I'd also favor the variant where warnings are always being printed. If people are bothered by this, people can still globally deactivate warnings.
In general, I think that we should als re-visit in general how we handle warnings as I think there are quite some places where people might want to deactivate one type of warning while keeping another one. But this should not be done here :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think the words pending experimetns
were mentioned once in the meeting, so not sure what you refer to...
Also, the argument that something is only possible if the imagined future with a filter method need to be discarded. We already filter now, just call it toggling, there is no need to defer anything into that imagined future because of it
If we look at the defacto state of the code now, non-pred recommenders can be passed pending experiments and could create this perma warning - This is not good, I remind you of the situation where the simulation module raised thousands of warnings because of the intial data. While its true that it can simply be discarded, users are confused and wonder rightfully whether they did something wrong, but in reality it was just us overly eager raising warnings.
So it seems simply to me the cleanest solution is pending experiments arg needs to be removed for non-pred recommenders (in the raising error sense to not meddle witht he protocol). The warning in recommenders becomes obsolete then. The campaign needs to decide whether it passes pending experiments or uses toggling or raises a warning if the respective combo appears - this shifting of responsibilities is akin to what you did with the other flag warnings - and again I dont see how that should be affected by further changes to renaming/moving our toggling/fitlering in an imaginary future, these imaginations relate to seachspace changes, my args are purely based on cmapaign-recommender topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we do not trade error for warnings
Yes, we would. Because your argument was:
non-pred recommenders can be passed pending experiments and could create this perma warning - This is not good
So I can literally just do RandomRecommender.recommend(batch_size=..., searchspace=..., pending_experiments=...)
as a single call without any context and would get 1) a warning in my version 2) an error in your version.
While changing the protocol is something to consider, I think it's out-of-scope for this PR. Hence, let's think about a clean intermediate solution that does not break later.
Can you give me a concrete suggestion in terms of the three options you mentioned above and then we see if we can get it running in a reasonable way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont want to suggest a verison where an error is triggered
this is achieved by the campaign taking care of providing pending_experiments
or not, something that doesnt seem to appear in the suggestions above
I think that should be independent of the protocol change. Is the protocol actually a big deal? is it ok if a class implementing the protocol "adds" an argument to the method? then is should be rather trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to you?
- let non-predictive recommenders throw an error if provided pending expeirments
- make campaign take care that no such error is actually thrown if
recommend
is called via campaign. the campaign can decide its warning/error logic based on the old flags just as the logic was inside recommenders before - the protocol change is not necessary for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need an additional opinion here? This discussion goes on for quite some time already. Or do we want to discuss this in our Meeting this week?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's align once and for all in the dev meeting 🙃
88ab485
to
818741b
Compare
Fixes #371 by making `SubspaceDiscrete` stateless. ### Current Approach * The `SubspaceDiscrete.metadata` attribute gets deprecated and the responsibility of metadata handling is shifted to `Campaign` * The new mechanism is not yet final (see out of scope below) but designed in a way that allows to implement upcoming changes in a non-breaking manner. In particular: * The metadata handling mechanism is redesigned in that the actual metadata representation is completely hidden from the user, i.e. campaign manages the data in form of private attributes. This is to avoid further lock-in into `pandas` as our search space backend and prepares for future search space improvements by abstracting away the specific implementation details, enabling us to easily offer other backends (polars, databases, etc) in the future. * The `allow_*` flags are not yet migrated to the `Campaign` class, but the `AnnotatedSubspaceDiscrete` allows to migrate them in a follow-up PR (#423) without causing much friction * A new user-facing method `Campaign.toggle_discrete_candidates` now allows convenient and dynamic control over the discrete candidate set, avoiding any fiddling with the backend dataframes and index manipulations. The positive effect can be seen in the much cleaner code parts of the simulation package. ### Out of scope / (potentially) coming next * Migration of `allow_*` flags in order to make `Campaign` the unique place where the concept of metadata exists, i.e. campaigns will be the only stateful objects. A PR taking care of this should follow soon because the `get_candidates` signature of `SubspaceDiscrete` currently makes not much sense, as it expects these flags in a context where metadata does not exist. * Once the flags are migrated, the `AnnotatedSubspaceDiscrete` might become obsolete since the `Campaign` class can then theoretically filter down the space before passing it to the recommender. This however requires an efficient implementation that does not cause unnecessary dataframe copies. * Actually turning the state space classes `frozen`. There a few other things that should be addressed at the same time (i.e. general cleanup of the classes).
818741b
to
1d23a9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very brief set of initial comments. More to come.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor questions/comments
kw_only=True, | ||
eq=cmp_using(eq=np.array_equal), | ||
) | ||
"""The filtering mask. ``True`` denote elements to be kept.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
denoteS
class FilteredSubspaceDiscrete(SubspaceDiscrete): | ||
"""A filtered search space representing a reduced candidate set.""" | ||
|
||
mask: npt.NDArray[np.bool_] = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe rename it to something like mask_for_keeping
? Would make it a bit clearer in other contexts. Can however also understand if this is to cumbersome and you want to keep mask
due to its simplicity. If that is the case, feel free to just resolve :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other places we also have mask_todrop
etc so why not mask_tokeep
/mask_keep
or mask_todrop
again, depending on the purpose
f"purely discrete spaces and " | ||
f"{fields(self.__class__).allow_recommending_pending_experiments.name}" | ||
f"=False.", | ||
f"'{self.__class__.__name__}' does not use this information.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait now the warning is printed whenever pending experiments is provided independent of the flag, that can be annoying
eg what happens if I pass pending experiments because I want to exclude them (flag on True), I'd always get this warning now or not?
only take pending points into consideration if the recommender flag | ||
[allow_recommending_pending_experiments](baybe.recommenders.pure.nonpredictive.base.NonPredictiveRecommender.allow_recommending_pending_experiments) | ||
is set to `False`. In that case, the candidate space is stripped of pending experiments | ||
that are exact matches with the search space, i.e. they will not even be considered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact that pending exps can still be excluded for those recommenders via the allow flags of the campaign needs to be mentioned here (which is roughly the updated equivalent of what you've deleted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for the outcome of this discussion first
b7a3cea
to
1766f8e
Compare
The original name allow_repeated_recommendations is not accurate since it also suggests that repeated configurations within a batch would be excluded, which is not the case.
Simply using `None` to represent an unspecified flag would be problematic since it evaluates to `False` in an if context, which is unintented. Instead, an unset flag occurring in such a context indicates a misconfiguration and should throw an error.
1766f8e
to
048d5eb
Compare
048d5eb
to
e366077
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most if my wishes have been implemented, here you go with your approve :)
This PR builds upon #412 and finalizes the metadata migration from search spaces / recommenders to
Campaign
, makingCampaign
the only stateful class:allow_*
flags are now passed directly toCampaign
, strengthening the role ofCampaign
as the class for metadata handling / tracking the progress of an experimentation path.allow_repeated_recommendations
is renamed toallow_recommending_already_recommended
because i) the original name was imprecise in that it also suggested that repetitions are disallowed within a batch and ii) the new follows the same pattern as the other two flags.AnnotatedSubspaceDiscrete
with a new private classFilteredSubspaceDiscrete
taking over the original role but without the necessity of being metadata aware.Coming next
The changes introduced here and in #423 bring significant improvements from a user interface perspective in that:
Constraint
objects and candidates dataframes.pandas
part of the current discrete search space implementation is less entangled in the code baseThis paves the way for upcoming enhancements:
SubspaceDiscreteProtocol
is already in sight, which allows seamless integration of other backends (polars
, databases, etc) and will help us complete the ongoingpolars
transition.SubspaceDiscrete.filter(constraints)
, which can also become the backbone of the current constraints logic