Why do search threads spend so much time waiting, especially with >1 shards #370
Replies: 1 comment
-
There was some good discussion here: https://discuss.elastic.co/t/one-es-container-with-n-search-threads-is-slower-than-two-containers-with-n-2-search-threads/256965 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Trying to prepare some multi-shard, multi-threaded benchmarks, but running into confusing performance results. Putting it out there in case others have some insight. The essence of the confusion is as follows:
Here are the search threads in VisualVM when running the SIFT benchmark (1M vectors) with 8 shards, 8 search threads and 1 parallel search request:
The threads spend a lot of time in the "parked" state. JProfiler shows the same result, except it calls this the "waiting" state.
Here's the same benchmark but making 8 parallel search requests:
The threads spend a lot less time in the "parked" state (i.e. much more green).
The question is: what are they waiting on? This is almost certainly more of a general Elasticsearch performance question.
My best guess is that they're waiting on Lucene's IO, which is hitting disk and then the file system cache, to do things like read postings lists. I've yet to find the tooling to really check this guess.Update: at this point I'm pretty certain it's nothing related to IO. The IO profiling in Jprofiler and Java Mission Control doesn't show any significant file access. I also confirmed that each shard is producing roughly the same number of results for each query, each query is executed exactly once on each of the search threads (i.e. once per shard, all in parallel), and the part of the Elastiknn code that fetches the field mapping is also not the bottleneck (i.e. removing it changes nothing). So I'm thinking it's most likely that the time spent waiting is the same time that's spent serializing/transferring the query and then the query results. This would align with the observation that the time spent waiting reduces once you run parallel queries.
Some more good discussion here: https://discuss.elastic.co/t/one-es-container-with-n-search-threads-is-slower-than-two-containers-with-n-2-search-threads/256965/2
A benchmarking demonstrating the question on the Elasticsearch forums: https://github.com/alexklibisz/elastiknn/blob/shards-question/examples/parallel-shards-question/benchmark.ipynb
Beta Was this translation helpful? Give feedback.
All reactions