Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CyclicConfigSuggestor #855

Merged
merged 18 commits into from
Nov 13, 2024
Merged

Conversation

eujing
Copy link
Contributor

@eujing eujing commented Sep 4, 2024

Currently used for validating a default configuration against some other specific configuration, in an alternating manner on the same MySQL instance. This sets up the collected pairs of metrics for something like a matched-pair t-test, without the noise of making comparisons across different MySQL instances.

More generally, multiple configs can be specified via the list cycle_tunable_values to cycle through in optimization mode.

EDIT: Originally implemented as a scheduler, now is a "ManualOptimizer" instead.

@eujing eujing requested a review from a team as a code owner September 4, 2024 23:08
@motus
Copy link
Member

motus commented Sep 13, 2024

I am not sure that this functionality belongs to the Scheduler. The Scheduler determines when the trial should be executed, but not the configuration of this trial. The latter belongs to the Optimizer, and we already have one-shot optimizer (effectively, a benchmark) and grid search optimizer that both, just like your "cyclic scheduler", ignore the trial results and just go on with a pre-determined sequence of configurations. I think your code should actually be an Optimizer - and its implementation will be much shorter and simpler that way.

P.S. Also, maybe we should use a better word for the Optimizers, since not all of them actually optimize stuff. A ConfigurationProposer, maybe? ConfigExplorer? @bpkroth do you have any suggestions?

@bpkroth
Copy link
Contributor

bpkroth commented Sep 20, 2024

I am not sure that this functionality belongs to the Scheduler. The Scheduler determines when the trial should be executed, but not the configuration of this trial. The latter belongs to the Optimizer, and we already have one-shot optimizer (effectively, a benchmark) and grid search optimizer that both, just like your "cyclic scheduler", ignore the trial results and just go on with a pre-determined sequence of configurations. I think your code should actually be an Optimizer - and its implementation will be much shorter and simpler that way.

I actually thing it's a fine fit for the Scheduler.

This isn't determining which config might improve confidence or scores, that's still handled by the Optimizer.

Instead, it's adjusting the method of trial execution in order to try and come up with an adjusted score to feed back to the Optimizer when evaluating the suggestion, one that is hopefully easier for the Optimizer to learn from.

Our other efforts on noisy tuning do something similar.

P.S. Also, maybe we should use a better word for the Optimizers, since not all of them actually optimize stuff. A ConfigurationProposer, maybe? ConfigExplorer? @bpkroth do you have any suggestions?

That's true I suppose, but I'm not certain I want to tackle that change atm. Let's take it up separately.

@motus
Copy link
Member

motus commented Sep 26, 2024

@bpkroth says:

This isn't determining which config might improve confidence or scores, that's still handled by the Optimizer.

Instead, it's adjusting the method of trial execution in order to try and come up with an adjusted score to feed back to the Optimizer when evaluating the suggestion, one that is hopefully easier for the Optimizer to learn from.

I am afraid I can't agree. I see the Environment, Optimizer, and Scheduler as three orthogonal components:

  • Optimizer: suggests configurations and puts them on the queue
  • Scheduler: schedules execution of each configuration extracted from the queue
  • Environment: executes trials, one config at a time

In that design, any instance of the Environment can be used with any instance of Optimizer and Scheduler. CyclicScheduler breaks this: it does not make sense to combine it with any Optimizer (Grid Search, for example); worse yet, since it replaces the Scheduler, it cannot be combined with other schedulers (e.g., for parallel execution).

Note that we don't need to feed the scores to the (Bayesian) optimizer immediately after we receive them from the environment - it will pick that data from Storage after we restart the experiment, e.g., switching from CyclicOptimizer to MlosCoreOptimizer.

Finally, nothing prevents us from implementing a CompositeOptimizer just like we did for the Environment. To the rest of the framework, it will look like an Optimizer that generates a sequence of the configurations; internally, it can be, e.g., a sequence of grid search for warm-up, followed by a certain number of FLAML iterations, followed by a few iterations on baseline and best configs. All that can be combined with ParallelScheduler, once we implement it.

@bpkroth
Copy link
Contributor

bpkroth commented Sep 26, 2024

@bpkroth says:

This isn't determining which config might improve confidence or scores, that's still handled by the Optimizer.
Instead, it's adjusting the method of trial execution in order to try and come up with an adjusted score to feed back to the Optimizer when evaluating the suggestion, one that is hopefully easier for the Optimizer to learn from.

I am afraid I can't agree. I see the Environment, Optimizer, and Scheduler as three orthogonal components:

  • Optimizer: suggests configurations and puts them on the queue
  • Scheduler: schedules execution of each configuration extracted from the queue
  • Environment: executes trials, one config at a time

In that design, any instance of the Environment can be used with any instance of Optimizer and Scheduler. CyclicScheduler breaks this: it does not make sense to combine it with any Optimizer (Grid Search, for example); worse yet, since it replaces the Scheduler, it cannot be combined with other schedulers (e.g., for parallel execution).

Note that we don't need to feed the scores to the (Bayesian) optimizer immediately after we receive them from the environment - it will pick that data from Storage after we restart the experiment, e.g., switching from CyclicOptimizer to MlosCoreOptimizer.

Finally, nothing prevents us from implementing a CompositeOptimizer just like we did for the Environment. To the rest of the framework, it will look like an Optimizer that generates a sequence of the configurations; internally, it can be, e.g., a sequence of grid search for warm-up, followed by a certain number of FLAML iterations, followed by a few iterations on baseline and best configs. All that can be combined with ParallelScheduler, once we implement it.

I think there was some additional discussion lost here. I agree with everything you've said above and corrected myself in a different thread.

For summary here, I was mistaken in my original understanding of the intent of this PR.
From previous discussions, I thought @eujing was attempting to create a method for "denoising" the signal produced by running a Trial via "normalizing" it against a run of the default configs on that same instance.

That (which isn't what he actually implemented here) very well could be implemented by a Scheduler (e.g., whenever a config (produced by an Optimizer or manually or whatever) is pulled off the queue, we execute both it and the default and compute the normalized score prior to submitting results back to the Storage layer).

But I agree that you're right about what is actually implemented - it's much more of a "config suggestor", and should be handled in an Optimizer layer instead, given the current abstractions.

@motus
Copy link
Member

motus commented Sep 26, 2024

Oh, I see now. We're on the same page, then!

@eujing eujing changed the title Cyclic scheduler Manual Optmizer Oct 2, 2024
@eujing eujing changed the title Manual Optmizer Manual Optimizer Oct 2, 2024
@eujing
Copy link
Contributor Author

eujing commented Oct 2, 2024

@motus @bpkroth I have re-implemented this as a ManualOptimizer instead, where an array of explicit tunable values can be provided, either via the optimizer's config or overwritten via globals.

For example:

"cycle_tunable_values": [
            {
                "require_secure_transport": "OFF",
                "sql_generate_invisible_primary_key": "OFF",
                "innodb_change_buffer_max_size": 25,
                "innodb_change_buffering": "all"
            },
            {
                "require_secure_transport": "OFF",
                "sql_generate_invisible_primary_key": "OFF",
                "innodb_change_buffer_max_size": 50,
                "innodb_change_buffering": "inserts"
            }
        ]

bpkroth added a commit that referenced this pull request Oct 4, 2024
# Pull Request

## Title

Allows an empty dictionary (object) to represent the default tunable
values for tunable params.

---

## Description

This should make it easier to maintain configs that loop over a chosen
set of tunable values, where some of the tunable values are the default
values without needing to copy/paste those values from the tunable
params definitions.

- See Also: [discussion in
#855](#855 (comment))
---

## Type of Change

- ✨ New feature

---

## Testing

Extends existing testing infrastructure to check that this works.

---
@motus motus added enhancement New feature or request ready for review Ready for review mlos-bench labels Oct 4, 2024
@motus
Copy link
Member

motus commented Oct 4, 2024

In all, I am fine with the implementation, but I really don't like the name ManualOptimizer. Can we call it FewShotOptimizer or ConstOptimizer instead? @bpkroth , WDYT?

Also, can we remove a corresponding TODO comment from OneShotOptimizer?

@bpkroth
Copy link
Contributor

bpkroth commented Oct 7, 2024

In all, I am fine with the implementation, but I really don't like the name ManualOptimizer. Can we call it FewShotOptimizer or ConstOptimizer instead? @bpkroth , WDYT?

Works for me.

Or, can use your other terminology for now: CyclicConfigSuggestor
And consider to refactor/rename the rest in the package later on.

Also, can we remove a corresponding TODO comment from OneShotOptimizer?

👍

@bpkroth bpkroth changed the title Manual Optimizer CyclicConfigSuggestor Oct 9, 2024
Copy link
Member

@motus motus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge in the PR already. I am not quite happy with the ManualOptimizer name (it does not tell exactly what the class does), but I am not ready to rename all our optimizers to Suggestors yet. Let's merge it as ManualOptimizer and then have another PR to rename it to something more sensible, like RepeatTrialsOptimizer

P.S. Also, can we run a formatter on all JSON files just like we do it for Python? It looks like formatting for some of the new JSON files is a bit off

@eujing eujing requested a review from bpkroth November 13, 2024 21:30
@bpkroth
Copy link
Contributor

bpkroth commented Nov 13, 2024

I think we can merge in the PR already. I am not quite happy with the ManualOptimizer name (it does not tell exactly what the class does), but I am not ready to rename all our optimizers to Suggestors yet. Let's merge it as ManualOptimizer and then have another PR to rename it to something more sensible, like RepeatTrialsOptimizer

Thing is we can already repeat trials via the Scheduler.

I thought we'd settled on FewShotOptimizer for now and to remove the TODO comment in the OneShotOptimizer.
#855 (comment)

Anyways, I agree, we may want to take up renaming later on, but not right now - it'd be a large change.

P.S. Also, can we run a formatter on all JSON files just like we do it for Python? It looks like formatting for some of the new JSON files is a bit off

Can you file an Issue for this? We can take it up separately (e.g., via prettier or some such like we used to do).

Copy link
Contributor

@bpkroth bpkroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prior comments about naming, but good enough for me for now.

@bpkroth bpkroth merged commit 6b9bbd9 into microsoft:main Nov 13, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mlos-bench ready for review Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants