-
Notifications
You must be signed in to change notification settings - Fork 298
Full Text Search
Couchbase Lite has some fairly simple but useful support for full-text search, i.e. the kind of search you do in Spotlight or Google.
Any view can index text instead of the regular JSON keys. To do so, you make its map block emit a special key created by the CBLTextKey()
function, whose parameter is the string to be indexed.
[[db viewNamed: @"blogText"] setMapBlock: MAPBLOCK({
if ([doc[@"type"] == "blog") {
NSString* body = stripHTMLTags(doc[@"body"]);
emit(CBLTextKey(body), doc[@"title"]);
}
}) reduceBlock: NULL version: @"1"];
Note: Don't emit both full-text keys and regular JSON keys in the same view! Use separate views instead.
CBLQuery
has some special properties for full-text searches; they're declared in the header CBLQuery+FullTextSearch.h
(which is already included by CouchbaseLite.h
.)
The most important one is fullTextQuery
, an NSString containing the search term(s). Setting this to a non-nil value changes the query to full-text.
NOTE: Always set
fullTextQuery
when querying full-text views, and never set it when querying other types of views. Otherwise you'll get undefined (i.e. bogus) results.
The query language is defined by the SQLite Full-Text Search (FTS) extension, and is documented on the SQLite website. The gist of it is:
- Search terms are either individual words, or phrases delimited by double-quotes.
- Appending a
*
to a search term denotes a prefix search that matches any word beginning with that term. - When multiple search terms are separated by spaces, all of them have to match -- it's an implicit "AND" conjunction.
- You can also put the words
AND
orOR
(in all caps) between terms. - The word
NOT
(in all caps) before a term negates it: only rows that don't include it will be returned. - The word
NEAR
(in all caps) between terms is likeAND
but also requires that the matches be near each other. - Multiple terms or expressions can be wrapped in parentheses for grouping.
NOTE: When using ForestDB storage, Couchbase Lite implements full-text search itself instead of using SQLite, and doesn't support the fancy search syntax. Instead, the search terms are individual words, with implicit "AND" conjunctions.
CBLQuery* query = [[db viewNamed: @"blogText"] query];
query.fullTextQuery = @"Couchbase NEAR (Lite OR mobile)";
query.fullTextSnippets = YES; // enables snippets; see next example
A full-text CBLQuery
returns its results as instances of CBLFullTextQueryRow
, a subclass of CBLQueryRow
with some extra accessors.
- The
fullText
property returns the text that was indexed. - The
matchCount
property returns the number of matches that were found in the text. -
-textRangeOfMatch:
returns an NSRange giving the character range in thefullText
of a match. -
-termIndexOfMatch:
indicates which term in the query was matched. The terms in the queries are numbered, left to right, starting at 0. (Terms that have theNOT
operator applied are ignored.) -
-snippetWithWordStart:wordEnd:
returns an brief substring of the full text that includes the matched terms (or as many as fit). It's intended to be shown in a compact search-results list in your app's UI. ThewordStart
andwordEnd
strings can be used to highlight the matched terms: they're inserted before and after every appearance of a matched term. For instance, you could use[
and]
, or<b>
and</b>
if you're displaying results as HTML. (Note: To enable snippets, you have to set the query'sfullTextSnippets
property.)
By default, query rows are returned in descending order of relevance (by a fairly simple/naïve definition of "relevance".) If you don't care about this ranking, you can make the search a bit faster by setting the query's fullTextRanking
property to NO
.
for (CBLFullTextQueryRow* row in [query rows]) {
NSLog(@"Title: %@", row.value) // the map fn emits the post title as the value
NSLog(@"Text: %@", [row snippetWithWordStart: @"[" wordEnd: @"]"]);
}
To perform a full-text query using the REST API, add the search string as the query parameter ?full_text
to the view query's URL. (Remember to URL-encode any special characters and punctuation, including spaces.) Additionally, if you want snippets in the result, add the query parameter ?snippets=true
.
The rows in the response will not have a key
property but will have a matches
property that indicates where the matches occurred in the indexed text. Its value is an array of objects; each object has a range
property of the form [byte_offset, byte_length]
and a term
property that's a number indicating which search term matched.
Note: The
range
values are measured in bytes, not characters. They assume UTF-8 encoding.
If you specified the ?snippets
option, each row will also have a snippet
property containing a snippet of the full text containing the match(es).
SQLite's full-text search engine (FTS4) supports a simple query language using AND
, OR
and NOT
keywords, and prefix matching by appending *
to a word. But when Couchbase Lite is using ForestDB for storage it has to implement full-text search itself, and those features haven't been implemented yet. Instead there is an implicit AND
conjunction between every word in the search string.
JavaScript map functions don't have an equivalent of the CBLTextKey()
function that they can call, so they can't create a view with a full-text index. (Although they can query such a view, if the index exists.)
Full-text search relies heavily on tokenizing -- breaking text into words -- and the tokenizer we're using only supports languages that use space characters between words; that means it can't find word breaks in Asian languages, which typically aren't written with spaces. The result is that indexing won't work, unless spaces are explicitly inserted into the text to be indexed.
(There do exist tokenizers for CJK languages; here's a good discussion.)
Stemming means ignoring grammatical variations in words, like pluralization and verb tenses, for purposes of matching, so that a query for "dog" can match "dogs", and "searching" can match "searches". The current tokenizer we use has language-specific stemming rules for many Western languages, but the language has to be specified when the FTS data table is created, and Couchbase Lite has no API for that yet.
- You can't combine key-based and full-text queries in the same view. A view's
emit
calls should either emit regular keys or the special text objects, not some of each. - For this reason, the key-based properties of
CBLQuery
have no effect in a full-text search:startKey
,endKey
,startKeyDocID
,endKeyDocID
,keys
. - Full-text queries don't support reducing. They don't call the reduce block, and the reduce-based properties have no effect:
mapOnly
,groupLevel
.