Refactoring out-of-range and tolerance mechanism #428

AdrianSosic · 2024-11-13T09:24:07Z

This issue is an attempt to summarize the problems with our current out-of-range / tolerance approach, that have been present since long and should be addressed at some point.

Problem description

The core of the problem is that both aspects are currently treated via the same mechanism, although they are really two inherently different things.

Out-of-range value handling means itself two things:
1. To potentially warn the user / reject measurement input when parameter / target values are "far outside" their expected ranges, which can cause problems with scaling.
  For example: a discrete parameter is defined with values [0, 1, 2] but the user tries to add a measurement with parameter value 100. This can potentially be due to a mistake on the user side, e.g. copy-paste error in an Excel sheet.
2. Defining a reasonable way to process values that are not even defined.
  For example: a categorical parameter is defined with values ["a", "b", "c"] but the user tries to add a measurement with parameter value "d". The latter could be a valid choice, but since it's missing from the parameter definition, its encoding is undefined.
Then, there is the completely independent aspect of mapping measurements with inaccurate parameter values (e.g. due to difficulties of calibrating the physical experiment to the exact parameter values) to the closest defined value, for the purpose of metadata tracking, i.e. to avoid recommending the same parameter configuration again.

Current limitations

For numerical discrete parameters, both mechanism are implemented via the tolerance flag. This causes the problem that there is no way to add measurements that lie outside the tolerance intervals centered around the parameter value definitions.
For example: for defined values [1, 2, 5, 10] it is impossible to add a measurement with value 8 or 12 (i.e. both inside and outside the convex hull), because the the maximum allowed tolerance is determined via the minimal parameter value difference (2 - 1) / 2 = 0.5
The upper limit for the tolerance itself is unnecessary, but even when removed, the matching aspect should be decoupled from the data addition aspect. The current ugly workaround is to extend the parameter range to what is being measured (which must be done on-the-fly since unknown a priori) and then to exclude the new values from recommendation.
Update: There is also the possibility to entirely disable the tolerance check via numerical_measurements_must_be_within_tolerance when adding measurements, providing a partial workaround. Still, a clean fix should follow.
For task parameters, there already exists the active_values attribute, but there is no such alternative for categorical parameters. This means, out-of-range measurements can only be added using the same ugly workaround described above.

Suggested fix

The suggested fix is to decouple both aspects and design separate handling mechanisms:

The tolerance attribute should only be responsible for controlling the data matching aspect and not affect data addition itself. The upper limit of the tolerance should be removed. Instead, the data point should simply be matched to the closest defined parameter value within the defined tolerance range. If there is no such value, no error should be thrown. Instead, there should simply be no match reported for metadata tracking.
Numerical parameters (both discrete and continuous) can get a separate mechanism to warn users / reject out-of-range values. The exact rules need to be fleshed out, but ideally things become controllable during the data addition step itself, e.g. something like campaign.add_measurements(df, allow_out_of_range=True)
All our "other" categorical parameters (i.e. TaskParameter, SubstanceParameter, CustomParameter) could become subclasses of CategoricalParameter since they really are categorical parameters and a corresponding active_values mechanism can be implemented for all of them at the base class level.

The text was updated successfully, but these errors were encountered:

Scienfitz · 2024-11-13T10:14:46Z

please consolidate this duplicate with #375

Scienfitz · 2024-11-21T14:33:19Z

@AdrianSosic
as suggested you can simply set numerical_measurements_must_be_within_tolerance in add_measurements to False to stop all tolerance that potentially blocks your project, did that work?

Scienfitz · 2024-12-05T13:09:24Z

@AdrianSosic In the lack of any response I will close this issue soon unless properly amended

I suspect the issue described here as urgent is not urgent and not blocking, the possibility of numerical_measurements_must_be_within_tolerance was likely simply forgotten -> change the text to not give the impression this is an urgent issue
Please consolidate the contents with Refactor Tolerance Handling for Numerical Parameters #375 and close on of the two issues

AdrianSosic · 2024-12-05T14:53:21Z

Thanks for the reminder. The last two weeks were just crazy 🙈 Have just updated everything 👍🏼

AdrianSosic added the enhancement Expand / change existing functionality label Nov 13, 2024

AdrianSosic self-assigned this Nov 13, 2024

Scienfitz added the refactor label Nov 13, 2024

AdrianSosic mentioned this issue Dec 5, 2024

Refactor Tolerance Handling for Numerical Parameters #375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring out-of-range and tolerance mechanism #428

Refactoring out-of-range and tolerance mechanism #428

AdrianSosic commented Nov 13, 2024 •

edited

Loading

Scienfitz commented Nov 13, 2024

Scienfitz commented Nov 21, 2024

Scienfitz commented Dec 5, 2024

AdrianSosic commented Dec 5, 2024

Refactoring out-of-range and tolerance mechanism #428

Refactoring out-of-range and tolerance mechanism #428

Comments

AdrianSosic commented Nov 13, 2024 • edited Loading

Problem description

Current limitations

Suggested fix

Scienfitz commented Nov 13, 2024

Scienfitz commented Nov 21, 2024

Scienfitz commented Dec 5, 2024

AdrianSosic commented Dec 5, 2024

AdrianSosic commented Nov 13, 2024 •

edited

Loading