Updates to CalculateAverage_spullara #56

spullara · 2024-01-03T19:32:37Z

Making another PR to push progress in the open. Inlined a couple of methods so far. Attempts at SIMD acceleration are still failing. Kind of wondering if maybe it is already doing it in the VM for Array.equals for sure.

gunnarmorling · 2024-01-03T19:33:52Z

Instead of re-opening, could you please create a new PR, based on current upstream/main? Thx!

…

Message ID: ***@***.***>

spullara · 2024-01-03T20:01:46Z

I have SIMD changes in https://github.com/spullara/1brc/tree/simd but they are much slower.

gunnarmorling · 2024-01-03T20:31:18Z

Let me know when you're ready for another run. With 21.0.1-graalce again, I suppose?

spullara · 2024-01-03T20:52:54Z

I don't think enough has changed honestly and I'm not even sure the inlining helps that much. I'll also update to "sdk use" in my script.

spullara · 2024-01-03T22:13:28Z

Should be about 10% faster than before.

spullara · 2024-01-03T22:48:20Z

I've been running it with 21.0.1-graal from sdkman, added it to the script.

spullara · 2024-01-03T22:49:56Z

Ready for testing. 1780ms on my machine :)

yemreinci · 2024-01-04T15:08:06Z

Hi @spullara, I was able to make the original version perform 25% better on my machine with a few changes. Feel free to use the ideas: #86

gunnarmorling · 2024-01-04T17:04:10Z

Hey @spullara, could you rebase this one to latest main and then run this:

./test.sh spullara

This is a simple test suite I've added to make sure implementations are compliant and there's one failure (I have clarified some value ranges in the course of the day, this may be related). And once it's good, remove the "test failure" label again. Thanks!

spullara · 2024-01-04T17:51:25Z

Updated and passes. BTW, having test.sh delete my 1B row file that I now need to regenerate is quite the troll :)

gunnarmorling · 2024-01-04T17:56:59Z

BTW, having test.sh delete my 1B row file that I now need to regenerate is quite the troll :)

Hum, hum, interesting point. I'd consider it volatile data, so 🤷 . The root issue is that we should have made the data set name an argument to the programs under test. Next time.

That said, something seems wrong with this PR now, way too many unrelated changes. Can you rebase/squash so that it only contains your changes over the latest main? Thx!

spullara · 2024-01-04T18:02:39Z

Hum, hum, interesting point. I'd consider it volatile data, so 🤷 . The root issue is that we should have made the data set name an argument to the programs under test. Next time.

Yeah, mine also takes a file as an argument so I could test locally with a smaller file when it was slower.

gunnarmorling · 2024-01-04T18:13:51Z

Nice, looking good now. 00:12.063, super-clsoe #2 behind @filiphr with 00:12.027!

spullara · 2024-01-04T18:27:23Z

Nice, looking good now. 00:12.063, super-clsoe #2 behind @filiphr with 00:12.027!

His is using the hash code of the city as a key which is against the rules? Otherwise mine would be even faster.

https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_filiphr.java#L173

gunnarmorling · 2024-01-04T18:31:49Z

Argh, again?! Seems I really need to add a test with 10K distinct keys. It's so easy to miss otherwise. Gnarf.

spullara · 2024-01-04T18:33:37Z

Argh, again?! Seems I really need to add a test with 10K distinct keys. It's so easy to miss otherwise. Gnarf.

You would have to construct a test with names that purposefully have the same hash code to discover these bugs with a test. You have to do it by inspection.

gunnarmorling · 2024-01-04T18:37:06Z

Actually, I'm not sure. Seems the key is derived straight from the characters and it is used as a key in a map, not index in an array. I.e. I don't think there could be collisions?

spullara · 2024-01-04T18:40:24Z

If they aren't comparing the actual string value of the keys you can construct keys that collide with their formula. You allow city names up to 100 bytes of UTF-8 and this compresses that to just 4 bytes. It is impossible to construct a 4 byte (int) hash that represents 100 bytes of UTF-8 uniquely.

gunnarmorling · 2024-01-04T18:48:07Z

True. It would just overflow at some point. Gnarf, that stuff is really tough to spot, as I can't generically fabricate colliding values without knowing what's the hash function. @filiphr, I'll remove your entry from the leaderboard for now, until this issue has been fixed. @spullara, thanks for paying attention that closely!

spullara · 2024-01-04T18:50:20Z

You should see how fast mine is if I use the same "optimization" :) The single largest chunk of time is spent comparing keys to ensure that they match.

filiphr · 2024-01-04T18:58:54Z

I was reading the discussions around the hashing today. Sorry about it, I didn't have time to adapt it today.

I have an idea similar to also use Arrays.equals for this, let's see what that would lead to.

gunnarmorling added the test failure label Jan 4, 2024

squashed commit

86f0a4e

spullara force-pushed the main branch from e41d20d to 86f0a4e Compare January 4, 2024 18:01

gunnarmorling merged commit 4af3253 into gunnarmorling:main Jan 4, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to CalculateAverage_spullara #56

Updates to CalculateAverage_spullara #56

spullara commented Jan 3, 2024 •

edited

Loading

gunnarmorling commented Jan 3, 2024 via email

spullara commented Jan 3, 2024

gunnarmorling commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024 •

edited

Loading

yemreinci commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024 •

edited

Loading

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024 •

edited

Loading

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024 •

edited

Loading

filiphr commented Jan 4, 2024

Updates to CalculateAverage_spullara #56

Updates to CalculateAverage_spullara #56

Conversation

spullara commented Jan 3, 2024 • edited Loading

gunnarmorling commented Jan 3, 2024 via email

spullara commented Jan 3, 2024

gunnarmorling commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024

spullara commented Jan 3, 2024 • edited Loading

yemreinci commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024 • edited Loading

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024 • edited Loading

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024

gunnarmorling commented Jan 4, 2024

spullara commented Jan 4, 2024 • edited Loading

filiphr commented Jan 4, 2024

spullara commented Jan 3, 2024 •

edited

Loading

spullara commented Jan 3, 2024 •

edited

Loading

gunnarmorling commented Jan 4, 2024 •

edited

Loading

spullara commented Jan 4, 2024 •

edited

Loading

spullara commented Jan 4, 2024 •

edited

Loading