Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring out-of-range and tolerance mechanism #428

Open
AdrianSosic opened this issue Nov 13, 2024 · 4 comments
Open

Refactoring out-of-range and tolerance mechanism #428

AdrianSosic opened this issue Nov 13, 2024 · 4 comments
Assignees
Labels
enhancement Expand / change existing functionality refactor

Comments

@AdrianSosic
Copy link
Collaborator

AdrianSosic commented Nov 13, 2024

This issue is an attempt to summarize the problems with our current out-of-range / tolerance approach, that have been present since long and should be addressed at some point.

Problem description

The core of the problem is that both aspects are currently treated via the same mechanism, although they are really two inherently different things.

  • Out-of-range value handling means itself two things:
    1. To potentially warn the user / reject measurement input when parameter / target values are "far outside" their expected ranges, which can cause problems with scaling.
      For example: a discrete parameter is defined with values [0, 1, 2] but the user tries to add a measurement with parameter value 100. This can potentially be due to a mistake on the user side, e.g. copy-paste error in an Excel sheet.
    2. Defining a reasonable way to process values that are not even defined.
      For example: a categorical parameter is defined with values ["a", "b", "c"] but the user tries to add a measurement with parameter value "d". The latter could be a valid choice, but since it's missing from the parameter definition, its encoding is undefined.
  • Then, there is the completely independent aspect of mapping measurements with inaccurate parameter values (e.g. due to difficulties of calibrating the physical experiment to the exact parameter values) to the closest defined value, for the purpose of metadata tracking, i.e. to avoid recommending the same parameter configuration again.

Current limitations

  • For numerical discrete parameters, both mechanism are implemented via the tolerance flag. This causes the problem that there is no way to add measurements that lie outside the tolerance intervals centered around the parameter value definitions.
    For example: for defined values [1, 2, 5, 10] it is impossible to add a measurement with value 8 or 12 (i.e. both inside and outside the convex hull), because the the maximum allowed tolerance is determined via the minimal parameter value difference (2 - 1) / 2 = 0.5
    The upper limit for the tolerance itself is unnecessary, but even when removed, the matching aspect should be decoupled from the data addition aspect. The current ugly workaround is to extend the parameter range to what is being measured (which must be done on-the-fly since unknown a priori) and then to exclude the new values from recommendation.
    Update: There is also the possibility to entirely disable the tolerance check via numerical_measurements_must_be_within_tolerance when adding measurements, providing a partial workaround. Still, a clean fix should follow.

  • For task parameters, there already exists the active_values attribute, but there is no such alternative for categorical parameters. This means, out-of-range measurements can only be added using the same ugly workaround described above.

Suggested fix

The suggested fix is to decouple both aspects and design separate handling mechanisms:

  • The tolerance attribute should only be responsible for controlling the data matching aspect and not affect data addition itself. The upper limit of the tolerance should be removed. Instead, the data point should simply be matched to the closest defined parameter value within the defined tolerance range. If there is no such value, no error should be thrown. Instead, there should simply be no match reported for metadata tracking.
  • Numerical parameters (both discrete and continuous) can get a separate mechanism to warn users / reject out-of-range values. The exact rules need to be fleshed out, but ideally things become controllable during the data addition step itself, e.g. something like campaign.add_measurements(df, allow_out_of_range=True)
  • All our "other" categorical parameters (i.e. TaskParameter, SubstanceParameter, CustomParameter) could become subclasses of CategoricalParameter since they really are categorical parameters and a corresponding active_values mechanism can be implemented for all of them at the base class level.
@AdrianSosic AdrianSosic added the enhancement Expand / change existing functionality label Nov 13, 2024
@AdrianSosic AdrianSosic self-assigned this Nov 13, 2024
@Scienfitz
Copy link
Collaborator

please consolidate this duplicate with #375

@Scienfitz
Copy link
Collaborator

@AdrianSosic
as suggested you can simply set numerical_measurements_must_be_within_tolerance in add_measurements to False to stop all tolerance that potentially blocks your project, did that work?

@Scienfitz
Copy link
Collaborator

@AdrianSosic In the lack of any response I will close this issue soon unless properly amended

  1. I suspect the issue described here as urgent is not urgent and not blocking, the possibility of numerical_measurements_must_be_within_tolerance was likely simply forgotten -> change the text to not give the impression this is an urgent issue

  2. Please consolidate the contents with Refactor Tolerance Handling for Numerical Parameters #375 and close on of the two issues

@AdrianSosic
Copy link
Collaborator Author

Thanks for the reminder. The last two weeks were just crazy 🙈 Have just updated everything 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality refactor
Projects
None yet
Development

No branches or pull requests

2 participants