-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use search_pool to control iterator->Next() #1008
base: main
Are you sure you want to change the base?
Use search_pool to control iterator->Next() #1008
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: alwayslove2013 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@alwayslove2013 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
ee16112
to
3c70c4b
Compare
@@ -202,7 +203,7 @@ class IndexNode : public Object { | |||
return GenResultDataSet(nq, std::move(range_search_result)); | |||
} | |||
|
|||
auto its_or = AnnIterator(dataset, std::move(cfg), bitset); | |||
auto its_or = AnnIterator(dataset, std::move(cfg), bitset, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about why this parameter is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
The range_search function has utilized the search_pool to concurrently handle various queries.
To prevent potential deadlocks, the iterator for a single query no longer requires additional thread control over the next() call.
include/knowhere/index/index_node.h
Outdated
// If use_knowhere_search_pool is True (the default), the iterator->Next() will be scheduled by the | ||
// knowhere_search_thread_pool. | ||
// If False, will Not involve thread scheduling internally, so please take caution. | ||
template <bool use_knowhere_search_pool = true> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the justifications for adding use_knowhere_search_pool
as a template parameter instead of making a regular IndexIterator::use_knowhere_search_pool
field? I'm not sure I follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to reduce the judgment of use_knowhere_search_pool
in each iterator->next()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-read the code again and I don't believe I see a reason for making use_knowhere_search_pool
a template parameter. Basically, it leads to increase of the size of the compiled code, also this parameter is not generic enough (like T
in std::vector<T>
) and it does not have a critical-performance impact. Please consider demoting use_knowhere_search_pool
to a variable. Or I can accept the code as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion~ I will use use_knowhere_search_pool
as an iterator variable rather than a template param.
distances_id.id = distances_id.id == -1 ? -1 : distances_id.id + xb_id_offset; | ||
if (xb_id_offset != 0) { | ||
for (auto& distances_id : distances_ids) { | ||
distances_id.id = distances_id.id == -1 ? -1 : distances_id.id + xb_id_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm: this is evaluated as distances_id.id = a ? b : c
, not `distances_id.id = (a ? b : c) + xb_id_offset'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cqy123456 help check~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, -1
or distances_id.id + xb_id_offset
b331267
to
98048fc
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1008 +/- ##
=========================================
+ Coverage 0 73.96% +73.96%
=========================================
Files 0 82 +82
Lines 0 6972 +6972
=========================================
+ Hits 0 5157 +5157
- Misses 0 1815 +1815 |
try { | ||
for (int i = 0; i < nq; ++i) { | ||
auto compute_dist_func = [=]() -> std::vector<DistId> { | ||
auto xb = base_dataset->GetTensor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add comments on why we cannot do it outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
also, please squash the commits into a single one. Thanks. |
7a5a27d
to
6ce1b7f
Compare
Signed-off-by: min.tian <[email protected]>
6ce1b7f
to
92f234b
Compare
futs.emplace_back( | ||
ThreadPool::GetGlobalSearchThreadPool()->push([&, idx = i]() { task_with_ordered_iterator(idx); })); | ||
futs.emplace_back(ThreadPool::GetGlobalSearchThreadPool()->push([&, idx = i]() { | ||
ThreadPool::ScopedSearchOmpSetter setter(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rewrite the push to include this omp setter instead of calling it everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cqy123456 Is this feasible? Are there any scenarios where we do not need the OmpSetter
?
sign_(larger_is_closer ? -1 : 1) { | ||
} | ||
|
||
std::pair<int64_t, float> | ||
Next() override { | ||
if (!initialized_) { | ||
throw std::runtime_error("Next should not be called before initialization"); | ||
initialize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curios why we need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing error is a bit too strict?
// dims out of 30000 total dims) sharing at least 1 common non-zero dimension. | ||
results_.reserve(distances.size() * 0.3); | ||
for (size_t i = 0; i < distances.size(); i++) { | ||
if (distances[i] != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we do this? I think this PrecomputedDistanceIterator
is used for both dense/sparse BF
@zhengbuqian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted a new commit to reduce misunderstanding.
Signed-off-by: min.tian <[email protected]>
465f005
to
cf9d25d
Compare
issue: #997