-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High contention under many reader threads on hot keys #159
Comments
This is a big problem for us too. Any feedback on this topic would be appreciated. |
Hi @serkanturgut and @thompsonbry, thank you for bringing this back up, and for contributing an implementation! I will try to take a closer look over the next few weeks. To answer a few of your questions:
The reason we might mutate the table during a find is due to the "lazy rehashing" feature of the table. Meaning when we resize the table due to a too-high load factor, we don't immediately move all the elements in the table from the old array to the new one. Instead, the first acquirer of each lock is responsible for completing the transfer for all buckets on that lock. This helps amortize the cost of resizing across more operations and unlocks the table more quickly. I guess in theory we could defer migrating a particular lock to only the write operations on the table, but it would complicate the implementation a fair bit, since now the read operations on unmigrated locks would have to look in
Indeed this was a similar issue and my only hesitation was allowing users to use arbitrary locking implementations. Since the lock implementation is so important to performance, I was worried about losing the ability to keep track of performance by giving up control of the locking implementation. I'm definitely open to adding another lock implementation that favors read-mostly workloads. It would be great to see if we can show the new lock improving performance on the |
Thank you for the clarifications @manugoyal
Instead, we can perhaps preserve same design but we could probably start the read-only APIs with shared lock and only upgrade the lock to exclusive access if the read path decides a mutation is needed. It should be easy to add a
Thank you for the pointer, I'll try this soon and get back with the results. |
Yeah that sounds great. Only thing is I'd worry about deadlock in case multiple readers acquire the lock and then one tries to upgrade. Another approach would be to peek at the |
Hi,
In our application, we are observing high spin times inside
spinlock
. Typical use case that we observe the contention is many threads (64+ threads running on a 32 psychical cores machine with hyper-threading enabled, the contention starts the show up after 16 threads) and all of them contenting onfind
API trying to acquire the spinlock for the same keys from the hashtable.We tried adding Exclusive/Shared locking semantics to the spinlock implementation (which we have tested internally and confirmed it helped reducing the spinlock wait times significantly under high concurrency mostly with read-only threads), we'd like to get your opinions on this.
I created a PR for your assessment a67ae05
We are not currently using this change in production (therefore it's not battle-tested so to speak), we'd like to understand the original reasoning behind the single mode spinlock implementation and also the implementation in the PR whether it's not violating assumptions of the hashtable.
Please also note that, there were 2 lines I had to remove from
lock_two
function fornormal_mode_shared
specialization as following;Because these 2 lines were calling mutating functions downstream even when
find
API used. To be honest, I couldn't understand why they were exercised even for read path so please advise whether it was ok to remove these lines for read-onlyfind
API (If it was intentional forfind
APIs to make these calls when callinglock_two
, can you help us to understand why?)Note, I ran existing unit tests and stress tests, they seem to be passing. If you think the PR is reasonable, I can add more stress testing focusing on locking mode aspect of different API calls.
Similar issue I found was #146 but the proposal was different, so I wanted to post this separately.
Thank you in advance,
Serkan
The text was updated successfully, but these errors were encountered: