-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CreateMeasurements4 proposal #347
Comments
@AlexanderYastrebov @gunnarmorling could you check please? Thanks! |
If you're going this way, you should also add some tests where a few cities only appear once at random places in the input. |
Please don't take it the wrong way 🥲 I really want to see what creative solutions you come up for hash map, that's why I set it to 500M exactly instead of using |
Oh don't worry :) Another idea is to split the input into K chunks, and use each city only in one / a few of the chunks. Or of course you could make it completely linear, first 1000 lines for city 1, then 1000 lines for city 2, ... |
I can do that but the goal isn't to kill creativity 🥳 So I will stick to 500M exactly (please don't read the file in reverse 😟) |
Hey @lehuyduc, no problems with adding this generator, though probably I wouldn't run use it for any "official" run, so as to keep things somewhat focused (two experiments which I do want to do in addition to the original data set is running on all 32 cores / 64 threads of the machine and running with 10K different station names rather than the ~400 of the original example data set). Still this could be a nice tool for experimenting on their own. PR welcome. |
Use case: simulate new data appearing (for example if a new weather station is built) => test if program handle new key appearing correctly.
Summary: the first 500M lines only have 2500 keys, then the remaining 500M lines have full 10K keys.
The text was updated successfully, but these errors were encountered: