Recommended design for ingress-networking for multi-threaded server? #951
Unanswered
RedBeard0531
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've seen a ton of examples (including on your wiki) for using io_uring for a single-threaded IO-bound server, but none for a multi-threaded, compute-bound server¹ that is expected to handle 10s of thousands of long-lived, mostly-idle connections. One of our constraints is that requests have a wide variety of compute requirements, from sub-100 microseconds up to minutes or even hours, and it can be difficult/impossible to predict in advance how long a request will take to process. For the fast requests, any attempt to dispatch from a networking thread to a compute thread pool will kill us with the overhead of context switching. For the slow requests, by the time we know they are slow we are already deep in the call stack of application code, and it can be tricky to switch threads at that point. So we've found that it is best to have the thread/core that runs the
recv()
also process the request andsend()
the reply. This of course also has a side benefit of having the request and reply buffers already hot in that core's cache, which we wouldn't necessarily have if we dispatched to another thread.I think an ideal loop for our use case would look something like this (very much pseudo-code, eg all error handling is elided):
This is essentially a single-threaded server design, but with 2 main changes: we only consume a single queued event so that other threads can consume other events, and some additional logic to ensure we have some threads, but not too many, ready to process new incoming requests. These are both to ensure that fast requests can be serviced with reasonable latency (assuming available cores), without getting stuck behind slow requests.
However, I have a few questions/concerns:
recv
s on a subset of connections. The extreme version with a thread per connection would work, but is wasteful when the connection is idle, and probably no better than just using blocking IO.IORING_SETUP_ATTACH_WQ
, or allowing separate submission queues with a common completion queue.io_uring_enter()
that exactly 1 will be woken for each event that becomes ready? The docs seem unclear to me about whether passing 1 formin_complete
is sufficient for this. The use of "min" implies that a thread may be expected to consume multiple CQEs. Alternatively, if a single event becomes ready, I don't see any guarantee that it won't wake all threads blocked on the ring (eg if it independently checksn_complete >= min_complete
for each thread without decrementingn_complete
between checks).io_uring_enter
(eg after a burst of slow ops that has cooled off) I don't want them all to wake up all the time just for 999 go right back to sleep.epoll
but adapted to work with completions rather than readiness (which is better for us anyway). But maybe a radically different approach is better with io_uring?This is half a question, half a discussion opener. If there is already an obviously correct way to use io_uring for our use case and you could just explain it or link to some docs, that would be great. Even if we need to change our design. If not, I'd love to discuss potential improvements to io_uring that would make it work for us. I'd also understand if this use case is out of scope for what io_uring can reasonably be expected to work well for, and we should continue using alternatives such as
epoll
.¹In this specific case I am working on a database server, but I believe this question would apply to many types of multi-threaded RPC servers, including http servers serving dynamic content.
Beta Was this translation helpful? Give feedback.
All reactions