-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatroom's initial attempt #15
Conversation
Hey @fatroom I would offer the same suggestions that were made here: https://twitter.com/lobaorn/status/1742256906993799554 If using ZGC why not the generational one? Did you compare? |
@lobaorn generational gives slower results on my config (approx 7 to 10 sec overhead). I haven't deeply tuned GC, just run with different GC to check if something better than default one. Actually parallel one gives a slightly better (approx 1 second faster) results, but fluctuated between runs, so picked ZGC as more stable. |
Hey @fisk if you would have the time, would you mind providing insights on how to better use the Generational variation of ZGC for this challenge? Or also providing the insight that given the way the values are being read and computed, the Generational variation would not be of use. |
@lobaorn do you really think you can nerd snipe me like that with benchmarking games? Having said that... I implemented some experimental leyden support for generational ZGC, with some object streaming shenanigans allowing loading of archived objects, and support for archived code from the JIT cache so we can run with compiled code immediately and dodge most of the warmup costs. On my machine with 256 cores... real 0m2.318s ...beat that! Oh and here are the JVM flags: -XX:+UseLargePages -XX:+UseZGC -XX:+ZGenerational -Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:ConcGCThreads=4 -XX:+UnlockDiagnosticVMOptions -XX:-ZProactive -XX:ZTenuringThreshold=1 -Xlog:gc*:file=fatroom_gc.log -XX:SharedArchiveFile=Fatroom-dynamic.jsa -XX:+ReplayTraining -XX:+ArchiveInvokeDynamic -XX:+LoadCachedCode -XX:CachedCodeFile=Fatroom-dynamic.jsa-sc Here is my experimental Generational ZGC leyden branch: https://github.com/fisk/jdk/tree/1brc_genz_leyden Takes about 1/3 of the time compared to normal ZGC on my machine. |
I apologize for being so shameless in my sniping, but I am glad it worked! :P So now I will shamelessly share the link of your comment in the other PRs, so contestants can evaluate the importance of some tuning. I am not sure if anyone will be able/capable to use your experimental branch (and the Leyden flags), but at least a combination of the tuning with perhaps @shipilev build of Lilliput, who knows what difference could it make on the final implementations? |
To be clear about the GC tuning... I reduced the heap size to be rather small, mostly because on my large machine it's going so fast that the cost of materializing a larger heap starts to become noticeable. I also reduced the number of max concurrent GC threads as throwing more threads at the particular object graph created by this program doesn't seem to scale very well. Usually, ZGC is geared more towards low latency than it is towards maximum throughput. However, as you throw more cores at the JVM, you get to a point where concurrent GC starts to win in throughput when the object graph being traced simply has a topology that doesn't allow fully utilizing all cores on the machine. That becomes a throughput bottleneck for STW tracing GCs, which doesn't really matter for a concurrently tracing GC. |
Thanks for reaching out again, and I got it from the context of your original comment, and in another PR I did that contextualization as well, it is not a one-size-fits-all approach or configuration. Rather, it was just my perception that most submissions did not do any argument's tweaking, and I was trying to thought-provoke the submitters to check if any or some different configurations could be a game changer. So AppCDS, Heap, GC, Native... there are many flavors of optimizations that could happen after the code is finished, so it would also share some light on these new (even not new as AppCDS) or unfamiliar possibilities, since the challenge is gaining attention all over the place and in other stacks and etc. |
@gunnarmorling thanks for bringing this to my attention. Corrected parsing, so output matches the RI now |
93f1f93
to
a260f0a
Compare
@gunnarmorling I saw that on some PR you've asking to remove label once test suit pass. Seems as external contributor I don't have authority to manipulate labels here, so pinging you via comments |
00:26.576. Thanks for participating! |
(sorry for the noise on test failures, mistake on my end) |
Changes:
The name of your implementation class:
dev.morling.onebrc.CalculateAverage_fatroom
The JDK build to use (of not specified, the latest OpenJDK 21 upstream build will be used).
OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode)
The execution time of the program (Apple M1 Max, 32Gb):
real 0m15.303s
user 1m9.326s
sys 0m23.102s