Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Commit

Permalink
Merge #776
Browse files Browse the repository at this point in the history
776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec

Fixes partially #605

The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.

It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)

Note, before approving  the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.

Co-authored-by: Loïc Lecrenier <[email protected]>
  • Loading branch information
bors[bot] and loiclec authored Jan 31, 2023
2 parents a4e8158 + a2690ea commit 758b4ac
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions milli/src/update/words_prefix_position_docids.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,16 +140,20 @@ impl<'t, 'u, 'i> WordPrefixPositionDocids<'t, 'u, 'i> {

// We remove all the entries that are no more required in this word prefix position
// docids database.
let mut iter =
self.index.word_prefix_position_docids.iter_mut(self.wtxn)?.lazily_decode_data();
while let Some(((prefix, _), _)) = iter.next().transpose()? {
if del_prefix_fst_words.contains(prefix.as_bytes()) {
unsafe { iter.del_current()? };
// We also avoid iterating over the whole `word_prefix_position_docids` database if we know in
// advance that the `if del_prefix_fst_words.contains(prefix.as_bytes()) {` condition below
// will always be false (i.e. if `del_prefix_fst_words` is empty).
if !del_prefix_fst_words.is_empty() {
let mut iter =
self.index.word_prefix_position_docids.iter_mut(self.wtxn)?.lazily_decode_data();
while let Some(((prefix, _), _)) = iter.next().transpose()? {
if del_prefix_fst_words.contains(prefix.as_bytes()) {
unsafe { iter.del_current()? };
}
}
drop(iter);
}

drop(iter);

// We finally write all the word prefix position docids into the LMDB database.
sorter_into_lmdb_database(
self.wtxn,
Expand Down

0 comments on commit 758b4ac

Please sign in to comment.