You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's evidence that the recent addition of full PerformanceEvaluation objects to TunedModel histories is blowing up memory requirements in real use cases.
I propose that we create two PerformanceEvaluation objects - a detailed one (as we have now) and new CompactPerformanceEvaluation object. The evaluate method get's a new keyword argument compact=false and TunedModel gets a new hyperparameter compact_history=true (this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)
This would also allow us to ultimately address #575, which was shelved for fear of making evaluation objects too big.
Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold. This was always included in TunedModel histories previously, so it seems less disruptive to include it.
Fields
These fields are part of the public API of the PerformanceEvaluation struct.
model: model used to create the performance evaluation. In the case a
tuning model, this is the best model found.
measure: vector of measures (metrics) used to evaluate performance
measurement: vector of measurements - one for each element of measure - aggregating
the performance measurements over all train/test pairs (folds). The aggregation method
applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())
operation (e.g., predict_mode): the operations applied for each measure to generate
predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING.
per_fold: a vector of vectors of individual test fold evaluations (one vector per
measure). Useful for obtaining a rough estimate of the variance of the performance
estimate.
per_observation: a vector of vectors of vectors containing individual per-observation
measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for
the ith observation in the fth test fold, evaluated using the mth measure. Useful
for some forms of hyper-parameter optimization. Note that an aggregregated measurement
for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed
with the per_observation=false option, then e_per_observation is a vector of missings.
fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract
the learned parameters for each individual training event.
report_per_fold: a vector containing report(mach) for each machine mach training
in resampling - one machine per train/test pair.
train_test_rows: a vector of tuples, each of the form (train, test), where train
and test are vectors of row (observation) indices for training and evaluation
respectively.
resampling: the resampling strategy used to generate the train/test pairs.
repeats: the number of times the resampling strategy was repeated.
The text was updated successfully, but these errors were encountered:
ablaom
changed the title
Current evaluations objects, recently added to TunedModel histories) are too big
Current performance evaluation objects, recently added to TunedModel histories) are too big
Apr 22, 2024
ablaom
changed the title
Current performance evaluation objects, recently added to TunedModel histories) are too big
Current performance evaluation objects, recently added to TunedModel histories, are too big
Apr 22, 2024
There's evidence that the recent addition of full
PerformanceEvaluation
objects toTunedModel
histories is blowing up memory requirements in real use cases.I propose that we create two
PerformanceEvaluation
objects - a detailed one (as we have now) and newCompactPerformanceEvaluation
object. Theevaluate
method get's a new keyword argumentcompact=false
andTunedModel
gets a new hyperparametercompact_history=true
(this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)This would also allow us to ultimately address #575, which was shelved for fear of making evaluation objects too big.
Further thoughts anyone?
cc @CameronBieganek, @OkonSamuel
Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is
observations_per_fold
. This was always included inTunedModel
histories previously, so it seems less disruptive to include it.Fields
These fields are part of the public API of the
PerformanceEvaluation
struct.model
: model used to create the performance evaluation. In the case atuning model, this is the best model found.
measure
: vector of measures (metrics) used to evaluate performancemeasurement
: vector of measurements - one for each element ofmeasure
- aggregatingthe performance measurements over all train/test pairs (folds). The aggregation method
applied for a given measure
m
isStatisticalMeasuresBase.external_aggregation_mode(m)
(commonlyMean()
orSum()
)operation
(e.g.,predict_mode
): the operations applied for each measure to generatepredictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING.
per_fold
: a vector of vectors of individual test fold evaluations (one vector permeasure). Useful for obtaining a rough estimate of the variance of the performance
estimate.
per_observation
: a vector of vectors of vectors containing individual per-observationmeasurements: for an evaluation
e
,e.per_observation[m][f][i]
is the measurement forthe
i
th observation in thef
th test fold, evaluated using them
th measure. Usefulfor some forms of hyper-parameter optimization. Note that an aggregregated measurement
for some measure
measure
is repeated across all observations in a fold ifStatisticalMeasures.can_report_unaggregated(measure) == true
. Ife
has been computedwith the
per_observation=false
option, thene_per_observation
is a vector ofmissings
.fitted_params_per_fold
: a vector containingfitted params(mach)
for each machinemach
trained during resampling - one machine per train/test pair. Use this to extractthe learned parameters for each individual training event.
report_per_fold
: a vector containingreport(mach)
for each machinemach
trainingin resampling - one machine per train/test pair.
train_test_rows
: a vector of tuples, each of the form(train, test)
, wheretrain
and
test
are vectors of row (observation) indices for training and evaluationrespectively.
resampling
: the resampling strategy used to generate the train/test pairs.repeats
: the number of times the resampling strategy was repeated.The text was updated successfully, but these errors were encountered: