Searching

Search expression

The search field takes a search expression as input. A search expression is one of:

Search phrase
?expression (optional)
(expression)
expression | expression (or), e.g. go ( left | right )

To use literal ?, (, ) or |, a backslash (\) needs to be placed in front of the character.

A search phrase is a sequence of tokens where a token is one of:

Unannotated (fuzzy match, e.g. hello means fuzzy match hello)
'tok (' prefix means exact match the token)
^tok (^ prefix means prefix match the token)
tok$ ($ suffix means suffix match the token)
~ (explicit spaces, i.e. contiguous sequence of spaces, tabs, etc)

Tokens that are not separated by spaces, operators, or parentheses are treated specially, we call these linked tokens. For example, 12, :, 30 are linked in 12:30, but not in 12 : 30. Linked tokens have a much stricter search distance by default, e.g. in 12:30, Docfd will search for : only up to a few tokens away from 12, and so on. This allows user to state intention of reduced fuzziness.

To link spaces to tokens, one needs to be make use of ~. For example, to search for "John Smith" ("John" and "Smith" separated by some number of spaces), one can use John~Smith to establish linkage.

For ', ^, $ to be considered annotation markers, there cannot be space between the marker and token, e.g. ^abc means "prefix match abc", but ^ abc means "fuzzy match ^ and fuzzy match abc".

Annotated linked tokens are also treated specially:

^12:30 is equivalent to '12 ': ^30
'12:30 is equivalent to '12 ': '30
12:30$ is equivalent to 12$ ': '30

But with even stricter search restriction than the normal linked tokens, namely the next matching token must follow immediately from the current match, e.g. ^12:3 will not match 12 : 30 but will match 12:30

Search is asynchronous, specifically:

Editing of search field is not blocked by search progress
Updating/clearing the search field cancels the current search and starts a new search immediately

Optional operator handling specifics

For a phrase with optional operator, such as ?word0 word1 ..., the first word is grouped implicitly, i.e. it is treated as (?word0) word1 ....

Search phrase and search procedure

Document content and user input in the search field are tokenized/segmented in the same way, based on:

Contiguous alphanumeric characters
Individual symbols
Individual UTF-8 characters
Spaces

A search phrase is a list of said tokens.

Search procedure is a DFS through the document index, where the search range for a word is fixed to a configured range surrounding the previous word (when applicable).

A token in the index matches a token in the search phrase if they fall into one of the following cases:

They are a case-insensitive exact match
They are a case-insensitive substring match (token in search phrase being the substring)
They are within the configured case-insensitive edit distance threshold

Search results are then ranked using a heuristic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching

Search expression

Optional operator handling specifics

Search phrase and search procedure

Clone this wiki locally