Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] make deterministic parameter more thorough? #6731

Open
jameslamb opened this issue Nov 27, 2024 · 6 comments
Open

[RFC] make deterministic parameter more thorough? #6731

jameslamb opened this issue Nov 27, 2024 · 6 comments

Comments

@jameslamb
Copy link
Collaborator

Proposal

When deterministic = true in parameters, other configuration values should be overwritten with values that will reduce randomness, even if those values make LightGBM slower or more resource-intensive.

Specifically, deterministic = true should result in:

  • force_row_wise = true
  • gpu_use_dp = true
  • seed = 708 (or some other fixed, positive, non-0 number)

Motivation

In my experience responding to issues here and on Stack Overflow, LightGBM's sources of non-determinism are often misunderstood. It'd be simpler (for both users and maintainers) if it was possible to get deterministic results by just setting {"deterministic": true}.

Examples of the discussions I'm referring to:

Description

There are plenty of implementation details that would need to be worked out, but in short I'm envisioning this updating would happen only on the C++ side, not in wrappers.

And that a new documentation section should be added (not in the deterministic parameter's part of https://lightgbm.readthedocs.io/en/latest/Parameters.html), explaining the limitations of this approach and how to further improve reproducibility.

Other Questions

How to handle multithreading with OpenMP?

Proposing we do the following:

  • audit all OpenMP pragmas and confirm that any where ordering matters (e.g. gradient accumulation) turn off parallelism when deterministic = true (as in Support deterministic  #3494)
  • add test(s) that deterministic = true is sufficient to get completely deterministic results for the CPU version

But alternatively, we could do something simpler:

  • set num_threads = 1 whenever deterministic = true

Which is preferable?

Reviewers

@shiyu1994 @guolinke @StrikerRUS @borchero @jmoralez @btrotta I'd like to hear your thoughts on this. Thanks for your consideration!

@jameslamb jameslamb changed the title [RFC] make deterministic parameter more impactful? [RFC] make deterministic parameter more thorough? Nov 27, 2024
@borchero
Copy link
Collaborator

I generally like this! In my experience, it is always slightly annoying to get ML libraries to behave purely deterministically. Having this feature built-in would be pretty nice.

That being said, I find it slightly weird to fix seed to a certain value inside the library code. I don't see how to achieve the desired behavior without it though...

@StrikerRUS
Copy link
Collaborator

I agree that setting deterministic = true should involve setting other required params for reproducibility implicitly. But I'm not sure which particular approach would be better...

@jameslamb
Copy link
Collaborator Author

I'm not sure which particular approach would be better...

Does this comment just refer to the 2 different approaches I mentioned for dealing with multi-threading? If so, I think we should start with the other sources of non-determinism (like setting force_row_wise = true) and come back to multi-threading separately.

I find it slightly weird to fix seed to a certain value inside the library code

To be clear, I'm proposing that passing deterministic = true set seed = 708 (or some other non-0 value) only if params does not already set seed to some non-0 value.

That's the simplest way I can think of to achieve this behavior, and I strongly suspect that most users setting deterministic = true would not care about which particular random seed is used, just that the same one is used on every run.

@StrikerRUS
Copy link
Collaborator

@jameslamb

Does this comment just refer to the 2 different approaches I mentioned for dealing with multi-threading?

Yes, my comment was about those ones.

@shiyu1994
Copy link
Collaborator

Fully agree with this proposal. But for CUDA version, we may still encounter nondeterministic issues due to nondeterministic order of atomic operations in adding up the histograms. So far, I have now idea to solve this case yet, without sacrificing too much training speed.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 1, 2024

Makes sense to me, I definitely think it is acceptable for the CUDA version to have weaker guarantees about determinism in exchange for being much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants