fatroom's initial attempt #15

fatroom · 2024-01-02T18:44:06Z

Changes:

use mmap buffers
get some memory
Parallel stream
Long arythmetic

The name of your implementation class:
dev.morling.onebrc.CalculateAverage_fatroom

The JDK build to use (of not specified, the latest OpenJDK 21 upstream build will be used).
OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode)

The execution time of the program (Apple M1 Max, 32Gb):
real 0m15.303s
user 1m9.326s
sys 0m23.102s

lobaorn · 2024-01-02T18:56:02Z

Hey @fatroom I would offer the same suggestions that were made here: https://twitter.com/lobaorn/status/1742256906993799554

If using ZGC why not the generational one? Did you compare?

fatroom · 2024-01-02T19:21:12Z

@lobaorn generational gives slower results on my config (approx 7 to 10 sec overhead). I haven't deeply tuned GC, just run with different GC to check if something better than default one. Actually parallel one gives a slightly better (approx 1 second faster) results, but fluctuated between runs, so picked ZGC as more stable.

lobaorn · 2024-01-02T20:08:31Z

Hey @fisk if you would have the time, would you mind providing insights on how to better use the Generational variation of ZGC for this challenge? Or also providing the insight that given the way the values are being read and computed, the Generational variation would not be of use.

gunnarmorling · 2024-01-03T14:14:56Z

Hey @fatroom, thanks a lot for this entry. It looks like the output differs from the one of the reference implementation (some rounding issue, see the discussion at #5 for more context). Could you make sure that the output matches that one of the RI? Thanks!

fisk · 2024-01-03T14:51:11Z

@lobaorn do you really think you can nerd snipe me like that with benchmarking games?

Having said that... I implemented some experimental leyden support for generational ZGC, with some object streaming shenanigans allowing loading of archived objects, and support for archived code from the JIT cache so we can run with compiled code immediately and dodge most of the warmup costs.

On my machine with 256 cores...

real 0m2.318s
user 1m24.229s
sys 0m2.009s

...beat that!

Oh and here are the JVM flags: -XX:+UseLargePages -XX:+UseZGC -XX:+ZGenerational -Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:ConcGCThreads=4 -XX:+UnlockDiagnosticVMOptions -XX:-ZProactive -XX:ZTenuringThreshold=1 -Xlog:gc*:file=fatroom_gc.log -XX:SharedArchiveFile=Fatroom-dynamic.jsa -XX:+ReplayTraining -XX:+ArchiveInvokeDynamic -XX:+LoadCachedCode -XX:CachedCodeFile=Fatroom-dynamic.jsa-sc

Here is my experimental Generational ZGC leyden branch: https://github.com/fisk/jdk/tree/1brc_genz_leyden

Takes about 1/3 of the time compared to normal ZGC on my machine.

lobaorn · 2024-01-03T15:33:32Z

I apologize for being so shameless in my sniping, but I am glad it worked! :P

So now I will shamelessly share the link of your comment in the other PRs, so contestants can evaluate the importance of some tuning. I am not sure if anyone will be able/capable to use your experimental branch (and the Leyden flags), but at least a combination of the tuning with perhaps @shipilev build of Lilliput, who knows what difference could it make on the final implementations?

fisk · 2024-01-03T18:11:05Z

To be clear about the GC tuning... I reduced the heap size to be rather small, mostly because on my large machine it's going so fast that the cost of materializing a larger heap starts to become noticeable. I also reduced the number of max concurrent GC threads as throwing more threads at the particular object graph created by this program doesn't seem to scale very well.

Usually, ZGC is geared more towards low latency than it is towards maximum throughput. However, as you throw more cores at the JVM, you get to a point where concurrent GC starts to win in throughput when the object graph being traced simply has a topology that doesn't allow fully utilizing all cores on the machine. That becomes a throughput bottleneck for STW tracing GCs, which doesn't really matter for a concurrently tracing GC.

lobaorn · 2024-01-03T18:23:47Z

Thanks for reaching out again, and I got it from the context of your original comment, and in another PR I did that contextualization as well, it is not a one-size-fits-all approach or configuration. Rather, it was just my perception that most submissions did not do any argument's tweaking, and I was trying to thought-provoke the submitters to check if any or some different configurations could be a game changer.

So AppCDS, Heap, GC, Native... there are many flavors of optimizations that could happen after the code is finished, so it would also share some light on these new (even not new as AppCDS) or unfamiliar possibilities, since the challenge is gaining attention all over the place and in other stacks and etc.

fatroom · 2024-01-03T21:15:01Z

@gunnarmorling thanks for bringing this to my attention. Corrected parsing, so output matches the RI now

fatroom · 2024-01-04T21:17:41Z

@gunnarmorling I saw that on some PR you've asking to remove label once test suit pass. Seems as external contributor I don't have authority to manipulate labels here, so pinging you via comments

gunnarmorling · 2024-01-05T10:29:36Z

00:26.576. Thanks for participating!

gunnarmorling · 2024-01-05T10:30:00Z

(sorry for the noise on test failures, mistake on my end)

fatroom changed the title ~~Initial attempt~~ fatroom's initial attempt Jan 2, 2024

gunnarmorling added the output differs label Jan 3, 2024

fatroom added 6 commits January 4, 2024 22:07

Initial attempt

66f71b6

Fixed temperature parsing

f62f7cd

Switched to memory mapped files

f0db2e3

Fixed rounding issues

9921a8a

Inline of temperature reading

88d33dd

Fixed output rounding

a260f0a

fatroom force-pushed the attempt_fatroom branch from 93f1f93 to a260f0a Compare January 4, 2024 21:14

gunnarmorling added test failure and removed output differs test failure labels Jan 5, 2024

gunnarmorling merged commit 15cceae into gunnarmorling:main Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fatroom's initial attempt #15

fatroom's initial attempt #15

fatroom commented Jan 2, 2024 •

edited

Loading

lobaorn commented Jan 2, 2024

fatroom commented Jan 2, 2024

lobaorn commented Jan 2, 2024

gunnarmorling commented Jan 3, 2024

fisk commented Jan 3, 2024

lobaorn commented Jan 3, 2024

fisk commented Jan 3, 2024

lobaorn commented Jan 3, 2024

fatroom commented Jan 3, 2024

fatroom commented Jan 4, 2024

gunnarmorling commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

fatroom's initial attempt #15

fatroom's initial attempt #15

Conversation

fatroom commented Jan 2, 2024 • edited Loading

lobaorn commented Jan 2, 2024

fatroom commented Jan 2, 2024

lobaorn commented Jan 2, 2024

gunnarmorling commented Jan 3, 2024

fisk commented Jan 3, 2024

lobaorn commented Jan 3, 2024

fisk commented Jan 3, 2024

lobaorn commented Jan 3, 2024

fatroom commented Jan 3, 2024

fatroom commented Jan 4, 2024

gunnarmorling commented Jan 5, 2024

gunnarmorling commented Jan 5, 2024

fatroom commented Jan 2, 2024 •

edited

Loading