SwiftAnnoy provides an interface (bindings) to the Annoy (Approximate Nearest Neighbors Oh Yeah) C++ library.
- Float16 (half-precision)
- Float (32 bit)
- Double (64 bit)
- angular
- dotProcut
- euclidean
- hamming
- manhattan
Nearest neighbors search (NNS) algorithms are used in a variety of tasks that include classification, clustering, image retreval, and recommendation. The fundamental task in NNS is to look for points (examples) in a data set that are closest to a given query using some metric of distance (manhattan, euclidean, etc.). Brute force search quickly becomes inefficient as the size of a dataset and number of dimensions in the data grow (see the curse of dimensionality). Searching can be made more efficient by intelligent indexing (k-d trees, ball tree, etc.), but even here large datasets quickly lead to large indexes, memory constraints, and suboptimal search speed.
Approximate nearest neighbors algorithms make a trade-off in reduced accuracy for improved query performance. Annoy offers benefits in both speed and memory footprint (indexes are memory mapped) that make it an attractive alternative to brute force search and exact nearest neighbor methods.
Swift Package Manager (SPM) If you are using Xcode you can do the following:
- Select File > Swift Packages > Add Package Dependency.
- copy and paste https://github.com/jbadger3/SwiftAnnoy into the search URL
- Select SwiftAnnoy and click next.
- Choose a rule for dependency management. Click next.
- Click Finish.
Using SwiftAnnoy is fairly straitforward. Follow the code snippets below to get started or have a look at the Jazzy generated docs.
First, create an AnnoyIndex<T>
as in:
let index = AnnoyIndex<Double>(itemLength: 2, metric: .euclidean)
Currently supported types are Float
and Double
.
Next, add some data to your index. There are two functions that can be used to populate an index: addItem
and addItems
.
var item0 = [1.0, 1.0]
var item1 = [3.0, 4.0]
var item2 = [6.0, 8.0]
var items = [item0, item1, item2]
// add one item
try? index.addItem(index: 0, vector: &item0)
// add multple items
try? index.addItems(items: &items)
Note: Annoy expects indices in chronological order from 0...n-1. If you need/intend to use some other id numbering system create your own mapping as memory will be allocated for max(id) + 1 items.
In order to run queries on an AnnoyIndex
the index must first be built.
try? index.build(numTrees: 1)
This is a one-time operation. You can't call index.build
a second time.
The parameter numTrees
specifies the number of trees you want Annoy to use to construct the index. The more trees you include in the index the more accurate the search results will be, but it will take longer to build, take up more space, and require more time to search. Experiment with this parameter to optimize the tradeoffs.
An AnnoyIndex
can be queried using either an item index or a vector.
// by item
let results = index.getNNsForItem(item: 3, neighbors: 3)
print(results)
"Optional((indices: [3, 2, 0], distances: [0.0, 5.0, 8.602325267042627]))"
// by vector
var vector = [3.0, 4.0]
let results2 = index.getNNsForVector(vector: &vector, neighbors: 3)
print(results2)
"Optional((indices: [2, 0, 1], distances: [0.0, 3.605551275463989, 3.605551275463989]))"
let fileURL = FileManager.default.temporaryDirectory.appending(path: "index.annoy")
try? index.save(fileURL)
let fileURL = FileManager.default.temporaryDirectory.appending(path: "index.annoy")
try? index.load(fileURL)
Improvements and suggestions are welcome. Have a look at the TODO list below or open up an issue for other areas of improvement.
- hamming distance
- bounds checking for getItem, getNNsForItem, and getNNsForVector
- half-precision support
- build
- unbuild
- unload
- setVerbose
- getItem
- setSeed
- onDiskBuild
- test for memory leaks
- verify mmap and prefault function properly
- macOS
- iOS
- Linux
SwiftAnnoy is released with an MIT license. If you use SwiftAnnoy in another project, please add a link to this repository in your acknowledgements.