-
Notifications
You must be signed in to change notification settings - Fork 116
Optimization Examples Using Cpp
Further improvement of the search performance can be achieved by additional processing as follows.
The default number of initial edges for each node in a default graph (ANNG) is a fixed value 10. To optimize the number, first, the objects should be inserted without building indexes as follows.
// construct-optimized-anng.cpp
#include "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
ifstream is("objects.tsv");
string line;
getline(is, line);
stringstream linestream(line);
NGT::Property property;
size_t n;
linestream >> n;
linestream >> property.dimension;
property.objectType = NGT::ObjectSpace::ObjectType::Float;
property.distanceType = NGT::Index::Property::DistanceType::DistanceTypeCosine;
string path("fasttext.anng");
NGT::Index::create(path, property);
NGT::Index index(path);
while (getline(is, line)) {
stringstream linestream(line);
vector<float> object;
while (!linestream.eof()) {
float value;
linestream >> value;
object.push_back(value);
}
index.append(object);
}
index.save();
return 0;
}
Next, execute the optimization script as follows. This script extracts the optimal number of initial edges for the inserted objects and just set the number into the index profile.
// optimize-anng.cpp
#include "NGT/Index.h"
#include "NGT/GraphOptimizer.h"
using namespace std;
int main(int argc, char **argv)
{
NGT::GraphOptimizer::ANNGEdgeOptimizationParameter parameter;
NGT::GraphOptimizer graphOptimizer(false);
graphOptimizer.optimizeNumberOfEdgesForANNG("fasttext.anng", parameter);
}
Then, build indexes as follows.
// build-optimized-anng.cpp
#include "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
NGT::Index index("fasttext.anng");
index.createIndex(16); // 16 is the number of threads to construct.
index.save();
return 0;
}
The next script finally optimizes the search parameters about the explored edges and memory prefetch for the existing indexes. This script does not modify the index data structure.
// optimize-search-parameters.cpp
#include "NGT/Index.h"
#include "NGT/GraphOptimizer.h"
using namespace std;
int main(int argc, char **argv)
{
NGT::GraphOptimizer graphOptimizer(false);
graphOptimizer.setProcessingModes(false, true, true, true);
graphOptimizer.optimizeSearchParameters("fasttext.anng");
}
Since an accuracy table which converts expected accuracies into epsilons is generated as well, an expected accuracy value instead of an epsilon value can be specified for search as follows after this optimization.
searchQuery.setExpectedAccuracy(0.9); // instead of setEpsilon()
Above-mentioned ANNG optimization will bring adequate search performance improvement for ANNG. Refining ANNG (RANNG) also improves the search performance, because it improves accuracy of neighboring nodes for each node by searching with each node. However, it is optional because it needs a long processing time. The following command to refine an ANNG must be executed after building the indexes.
// refine-anng.cpp
#include "NGT/Index.h"
#include "NGT/GraphOptimizer.h"
using namespace std;
int main(int argc, char **argv)
{
NGT::Index index("fasttext.anng");
NGT::GraphReconstructor::refineANNG(index, false);
index.save("fasttext.ranng");
return 0;
}
Since further more improvement of the search performance by using an ONNG can be expected, how to generate ONNG is described.
ONNG generation requires an ANNG with more edges than default initial edges as the excess edges are removed to optimize the graph. ONNG requires an ANNG at least with edges that are optimized as mentioned above. As a result, an ONNG from the ANNG can provide almost the best performance. However, if the best performance is needed, a larger number than the optimized number may be specified. The following example shows the creation of an ANNG with 100 edges.
// construct-fasttext-100.cpp
#include "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
ifstream is("objects.tsv");
string line;
getline(is, line);
stringstream linestream(line);
NGT::Property property;
size_t n;
linestream >> n;
linestream >> property.dimension;
property.objectType = NGT::ObjectSpace::ObjectType::Float;
property.distanceType = NGT::Index::Property::DistanceType::DistanceTypeCosine;
property.edgeSizeForCreation = 100;
string path("fasttext.anng-100");
NGT::Index::create(path, property);
NGT::Index index(path);
while (getline(is, line)) {
stringstream linestream(line);
vector<float> object;
while (!linestream.eof()) {
float value;
linestream >> value;
object.push_back(value);
}
index.append(object);
}
index.createIndex(16); // 16 is the number of threads to construct.
index.save();
return 0;
}
An ONNG can be converted from an ANNG with the NGT::GraphOptimizer as follows.
// build-onng.cpp
#include "NGT/Index.h"
#include "NGT/GraphOptimizer.h"
using namespace std;
int main(int argc, char **argv)
{
NGT::GraphOptimizer graphOptimizer(false);
int numOfOutgoingEdges = 10;
int numOfIncomingEdges = 100;
int numOfQueries = 200;
int numOfResultantObjects = 20; // k
graphOptimizer.set(numOfOutgoingEdges, numOfIncomingEdges, numOfQueries, numOfResultantObjects);
graphOptimizer.execute("fasttext.anng-100", "fasttext.onng");
return 0;
}
execute() executes degree adjustment, shortcut reduction (pass adjustment), search and prefetch parameter adjustment and accuracy table generation. numOfOutgoingEdges and numOfIncomingEdges specify the number of outgoing edges and incoming edges, respectively. numOfQueries is the number of query objects to optimize. numOfResultantObjects is the number of resultant objects to be searched. In this example, the number of incoming edges is the same as the number of edges for ANNG creation, i.e., 100. However, since the number of real edges of an ANNG is more than the specified number, a larger number of incoming edges can be specified.
The search-fasttext.cpp above must be modified to use the ONNG by replacing 'fasttext.anng' with 'fasttext.onng' on the following line.
NGT::Index index("fasttext.onng");
Even if the search time increases compared with the result of the first ANNG, the search accuracy of the ONNG should be higher. Therefore, since the epsilon value for an ANNG needs to be increased to acquire higher accuracies, the search time of an ANNG increases more than that of an ONNG.
Command line tool
Python
C++