You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is an attempt to summarize the problems with our current out-of-range / tolerance approach, that have been present since long and should be addressed at some point.
Problem description
The core of the problem is that both aspects are currently treated via the same mechanism, although they are really two inherently different things.
Out-of-range value handling means itself two things:
To potentially warn the user / reject measurement input when parameter / target values are "far outside" their expected ranges, which can cause problems with scaling. For example: a discrete parameter is defined with values [0, 1, 2] but the user tries to add a measurement with parameter value 100. This can potentially be due to a mistake on the user side, e.g. copy-paste error in an Excel sheet.
Defining a reasonable way to process values that are not even defined. For example: a categorical parameter is defined with values ["a", "b", "c"] but the user tries to add a measurement with parameter value "d". The latter could be a valid choice, but since it's missing from the parameter definition, its encoding is undefined.
Then, there is the completely independent aspect of mapping measurements with inaccurate parameter values (e.g. due to difficulties of calibrating the physical experiment to the exact parameter values) to the closest defined value, for the purpose of metadata tracking, i.e. to avoid recommending the same parameter configuration again.
Current limitations
For numerical discrete parameters, both mechanism are implemented via the tolerance flag. This causes the problem that there is no way to add measurements that lie outside the tolerance intervals centered around the parameter value definitions. For example: for defined values [1, 2, 5, 10] it is impossible to add a measurement with value 8 or 12 (i.e. both inside and outside the convex hull), because the the maximum allowed tolerance is determined via the minimal parameter value difference (2 - 1) / 2 = 0.5
The upper limit for the tolerance itself is unnecessary, but even when removed, the matching aspect should be decoupled from the data addition aspect. The current ugly workaround is to extend the parameter range to what is being measured (which must be done on-the-fly since unknown a priori) and then to exclude the new values from recommendation. Update: There is also the possibility to entirely disable the tolerance check via numerical_measurements_must_be_within_tolerance when adding measurements, providing a partial workaround. Still, a clean fix should follow.
For task parameters, there already exists the active_values attribute, but there is no such alternative for categorical parameters. This means, out-of-range measurements can only be added using the same ugly workaround described above.
Suggested fix
The suggested fix is to decouple both aspects and design separate handling mechanisms:
The tolerance attribute should only be responsible for controlling the data matching aspect and not affect data addition itself. The upper limit of the tolerance should be removed. Instead, the data point should simply be matched to the closest defined parameter value within the defined tolerance range. If there is no such value, no error should be thrown. Instead, there should simply be no match reported for metadata tracking.
Numerical parameters (both discrete and continuous) can get a separate mechanism to warn users / reject out-of-range values. The exact rules need to be fleshed out, but ideally things become controllable during the data addition step itself, e.g. something like campaign.add_measurements(df, allow_out_of_range=True)
All our "other" categorical parameters (i.e. TaskParameter, SubstanceParameter, CustomParameter) could become subclasses of CategoricalParameter since they really are categorical parameters and a corresponding active_values mechanism can be implemented for all of them at the base class level.
The text was updated successfully, but these errors were encountered:
@AdrianSosic
as suggested you can simply set numerical_measurements_must_be_within_tolerance in add_measurements to False to stop all tolerance that potentially blocks your project, did that work?
@AdrianSosic In the lack of any response I will close this issue soon unless properly amended
I suspect the issue described here as urgent is not urgent and not blocking, the possibility of numerical_measurements_must_be_within_tolerance was likely simply forgotten -> change the text to not give the impression this is an urgent issue
This issue is an attempt to summarize the problems with our current out-of-range / tolerance approach, that have been present since long and should be addressed at some point.
Problem description
The core of the problem is that both aspects are currently treated via the same mechanism, although they are really two inherently different things.
For example: a discrete parameter is defined with values [0, 1, 2] but the user tries to add a measurement with parameter value 100. This can potentially be due to a mistake on the user side, e.g. copy-paste error in an Excel sheet.
For example: a categorical parameter is defined with values ["a", "b", "c"] but the user tries to add a measurement with parameter value "d". The latter could be a valid choice, but since it's missing from the parameter definition, its encoding is undefined.
Current limitations
For numerical discrete parameters, both mechanism are implemented via the
tolerance
flag. This causes the problem that there is no way to add measurements that lie outside the tolerance intervals centered around the parameter value definitions.For example: for defined values [1, 2, 5, 10] it is impossible to add a measurement with value 8 or 12 (i.e. both inside and outside the convex hull), because the the maximum allowed tolerance is determined via the minimal parameter value difference (2 - 1) / 2 = 0.5
The upper limit for the tolerance itself is unnecessary, but even when removed, the matching aspect should be decoupled from the data addition aspect. The current ugly workaround is to extend the parameter range to what is being measured (which must be done on-the-fly since unknown a priori) and then to exclude the new values from recommendation.
Update: There is also the possibility to entirely disable the tolerance check via
numerical_measurements_must_be_within_tolerance
when adding measurements, providing a partial workaround. Still, a clean fix should follow.For task parameters, there already exists the
active_values
attribute, but there is no such alternative for categorical parameters. This means, out-of-range measurements can only be added using the same ugly workaround described above.Suggested fix
The suggested fix is to decouple both aspects and design separate handling mechanisms:
tolerance
attribute should only be responsible for controlling the data matching aspect and not affect data addition itself. The upper limit of the tolerance should be removed. Instead, the data point should simply be matched to the closest defined parameter value within the defined tolerance range. If there is no such value, no error should be thrown. Instead, there should simply be no match reported for metadata tracking.campaign.add_measurements(df, allow_out_of_range=True)
TaskParameter
,SubstanceParameter
,CustomParameter
) could become subclasses ofCategoricalParameter
since they really are categorical parameters and a correspondingactive_values
mechanism can be implemented for all of them at the base class level.The text was updated successfully, but these errors were encountered: