-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Sampling Interface #44
Comments
This makes a lot of sense. |
About making the API evolve, I had few of questions/comments:
|
I agree with all of these, and I think they'd be great additions! I was planning on making PRs for most of these myself. I think the Sobol Sample should just be centered, though--every point should be placed in the middle of the appropriate box, rather than at the start. So the first two points would be 1/4 and 3/4 instead of 0 and 1/2, for example. This is what |
100% agreed. It's something one of the students did that I never ended up correcting. It's one of the reasons I wouldn't make a v1.0 on this library yet 😅.
Yes, though the
The interface should be type-matching. If you use
Or just make it an option? |
With regards to names, I actually think we should drop
Yep, that makes sense! Although actually, it makes me think of a possibly-better interface. We might want users to specify what kind of region they want (e.g.
I don't think that's a huge deal; we can add an optional type argument that defaults to
I'm not sure we should include an option--there's really no reason to drop the initial 0 when we can use the centered Sobol samples instead, which have better discrepancy and don't have the problem of an initial 0. Including an option would just be a "shoot self in foot" argument. |
It can cause namespacing issues when exported. And
Passing DataTypes is almost always a bad idea because of how that plays with specialization heuristics. I mean, you can do it, but any function that isn't careful will lose inference.
|
Hmm, maybe, but would that come up all that often? The majority of users probably don't have 2 separate packages for doing QMC. And using As for Sobol, that shouldn't be a problem because we don't reexport it, right?
Hmm, that's strange; I haven't had any problems with it before. Is
Yep! |
Another suggestion--maybe it's better to use |
I agree |
@ParadaCarleton Hey! I'm not sure I feel qualified to implement this. As I noted in that other issue on Sobol.jl, I looked at the implementations scipy.stats.qmc.Sobol and Art Owen's R package. Superficially it didn't LOOK complicated, but I could not figure out what the code was doing in either case. I think I just don't understand the Sobol algorithm ell enough. I mentioned in that other issue that the R package is BSD licensed. So someone that understands it could translate it to Julia. I just had another look at the R code: https://artowen.su.domains/code/rsobol.R I don't know R but I can see that there are two ways to scrambe the points: A "nested uniform" ("nestu") and "Matousek's linear scramble" ("mato"). A minimal Julia version only needs one way to scramble the points. I had a look, but I don't really understand how they work. |
@ChrisRackauckas do you think you or someone else could make a draft PR to Surrogates.jl to swap to the "iterator over points" interface? Now that |
Yes sorry JuliaCon delays. I talked with @sharanry about it at JuliaCon. @sharanry did my description of the issue make sense? I'd like to get this prioritized a bit because right now it's the only big major bound on any SciML library (Surrogates.jl still doesn't allow the latest major on QuasiMonteCarlo, 6 months later) and it's somewhat a ticking timebomb. |
I meant @thazhemadam |
If @thazhemadam wants to talk on Slack about this I’d be happy to. |
The sampling interface is a bit of a mess at the moment. The way users are asked to specify sample sizes is clunky and unnatural, demanding that they perform unusual calculations to avoid shooting themselves in the foot. Sobol nets, for example, only exist for powers of two, but users can request any sample size. At the moment we just give users these sample sizes by truncating the sequence, even though these truncated sequences are not guaranteed to converge to the correct answer, and will typically perform worse than Monte Carlo point sets. For Faure nets, the situation is safer (we error if the sample size is inappropriate) but even more annoying for users (they have to calculate multiples of powers of prime numbers).
For digital nets, a better approach is outlined here, and used by Art Owen in his QMC packages. Rather than request a sample size, users can provide parameters for a point set's stratification properties (
m
), how many independent replications they'd like (λ
), and a base (b
), before providing themλ*b^m
points. I'd also suggest a feature where users can set an approximate sample size by using a keyword, then receive a net with the smallest possible net larger than the requested sample size.This would be a breaking change for sure.
The text was updated successfully, but these errors were encountered: