Skip to content

Latest commit

 

History

History
289 lines (178 loc) · 11.4 KB

README.md

File metadata and controls

289 lines (178 loc) · 11.4 KB

QBG

Command-line interface for NGT with Quantization for indexing high-dimensional data

Command

qbg - proximity search for high dimensional data with quantization

  $ qbg command [option] index [data]

Note:

When the environment variable POSIXLY_CORERECT is set on some platforms such as Cygwin, you should specifiy options before the command as follows.

  $ qbg [option] command index [data]

qbg handles two types of graphs with quantization: Quantized Graph (QG) and Quantized Blob Graph (QBG).

command for the quantized graph is one of:

command for the quantized blob graph is one of:

CREATE-QG

Make and initialize a QG directory for the quantized graph in the specified NGT index directory, and insert the data in the NGT index into the QG index.

  $ qbg create-qg [-P number_of_extended_dimensions] [-Q number_of_subvector_dimensions] index

index
Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph.

-P number_of_extended_dimensions
Specify the number of the extended dimensions. The number should be greater than or equal to the number of the genuine dimensions, and also should be a multiple of 4. When this option is not specified, the smallest multiple of 4 that is greater than the dimension is set to the number of the extended dimensions.

-Q number_of_subvector_dimension
Specify the number of the subvector dimensions. The number should be less than or equal to the the number of the extended dimensions, and also should be a divisor of the number of the extended dimensions. When this option is not specified, one is set to the number of the subvector dimensions.

BUILD-QG

Quantize the objects of the specified index and build a quantized graph into the index.

  $ qbg build-qg [-o number_of_objects_for_quantization] [-E max_number_of_edges] [-M number_of_trials] index

index
Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph.

-o number_of_objects_for_quantization
Specify the number of object for quantization and optimization. The number should be less than or equal to the number of the registered objects.

-E max_number_of_edges
Specify the maximum number of edges to build a qunatized graph. Since every 16 objects that are associated with edges of each node are processed, the number should be a multiple of 16.

-M number_of_trials
Specify the number of trials to optimize the subvector quantization.

SEARCH-QG

Search the index using the specified query data.

  $ qbg search-qg [-n number_of_search_objects] [-e search_range_coefficient] [-p result_expansion]
      [-r search_radius] index query_data

index
Specify the path of the existing quantized index.

query_data
Specify the path of the file containing query data. This file shall consist of one item of query data per line and each dimensional element of that data item shall be delimited by a space or tab. Each search shall be sequentially performed when providing multiple queries.

-n number_of_search_objects (default: 20)
Specify the number of search objects.

-e search_range_coefficient (default = 0.02)
Specify the magnification coefficient (epsilon) of the search range. A larger value means higher accuracy but slower searching, while a smaller value means a drop in accuracy but faster searching. While it is desirable to adjust this value within the range of 0 - 0.1, a negative value (> -1.0) may also be specified.

-p result_expansion (default = 3.0)
Specify the expansion ratio of the number of approximate inner search objects to the number of search objects. For example, when the ratio is 10 and the number of search objects is 20, the number of the approximate search objects is set to 200 inside the search processing. A larger value brings higher accuracy but slower searching.

CREATE

Make and initialize a QBG directory for the quantized blob graph.

  $ qbg create [-d number_of_dimension] [-P number_of_extended_dimensions] [-O object_type] [-D distance_function] [-C cluster_data_type] index

index
Specify the name of the directory for QBG.

-d number_of_dimensions
Specify the number of dimensions of registration data.

-P number_of_extended_dimensions
Specify the number of the extended dimensions. The number should be greater than or equal to the number of the genuine dimensions, and also should be a multiple of 4. When this option is not specified, the smallest multiple of 4 that is greater than the dimension is set to the number of the extended dimensions.

-O object_type
Specify the data object type.

  • c: 1 byte unsigned integer
  • f: 4 byte floating point number (default)

-D distance_function
Specify the distance function as follows.

  • 2: L2 distance (default)
  • c: Cosine similarity

-C cluster_data_type
Specify the cluster data type.

  • pq4: 4 bit product quantization (default)
  • sq8: 1 byte scalar quantization

-N number_of_subvectors
Specify the number of subvectors that should be a divisor of the number of the extended dimensions.

APPEND

Append the specified data to the specified index.

  $ qbg append index registration_data

BUILD

Quantize the objects of the specified index and build a quantized graph into the index.

  $ qbg build [-o number_of_objects_for_quantization] [-E max_number_of_edges] [-M number_of_trials] [-P rotation] index

index
Specify the name of the directory for the existing index such as ANNG or ONNG to be quantized. The index only with L2 distance and normalized cosine similarity distance can be quantized. You should build the ANNG or ONNG with normalized cosine similarity in order to use cosine similarity for the quantized graph.

-o number_of_objects_for_quantization
Specify the number of object for quantization and optimization. The number should be less than or equal to the number of the registered objects.

-P rotation
Specify the transform matrix type for the inserted and query object to optimize the subvector quantization.

  • r: Rotation matrix.
  • R: Rotation and repositioning matrix.
  • p: Repositioning matrix.
  • n: No matrix.

-M number_of_trials
Specify the number of trials to optimize the subvector quantization.

SEARCH

Search the index using the specified query data.

  $ qbg search [-n number_of_search_objects] [-e search_range_coefficient] [-p result_expansion]
      index query_data

index
Specify the path of the existing quantized index.

query_data
Specify the path of the file containing query data. This file shall consist of one item of query data per line and each dimensional element of that data item shall be delimited by a space or tab. Each search shall be sequentially performed when providing multiple queries.

-n number_of_search_objects (default: 20)
Specify the number of search objects.

-e search_range_coefficient (default = 0.02)
Specify the magnification coefficient (epsilon) of the search range. A larger value means higher accuracy but slower searching, while a smaller value means a drop in accuracy but faster searching. While it is desirable to adjust this value within the range of 0 - 0.1, a negative value (> -1.0) may also be specified.

-B blob_search_range_coefficient (default = 0.0)
Specify the magnification coefficient (epsilon) of the search range for the quantized blob graph.

-N number_of_explored_nodes (default = 256)
Specify the number of the explored nodes in the graph. When the number of the explored nodes reached the specified number, the search is terminated.

-p result_expansion (default = 0.0)
Specify the expansion ratio of the number of approximate inner search objects to the number of search objects. For example, when the ratio is 10 and the number of search objects is 20, the number of the approximate search objects is set to 200 inside the search processing. A larger value brings higher accuracy but slower searching.

Examples of using the quantized graph

Setup data

  $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv
  $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv
  $ head -1 sift-128-euclidean_query.tsv > query.tsv

Build the quantized graph

Build an ANNG for 128-dimensional, floating point data:

  $ ngt create -d 128 -o f -D 2 anng sift-128-euclidean.tsv
  Data loading time=15.4804 (sec) 15480.4 (msec)
  # of objects=1000000
  Processed 100000 objects. time= 4.26452 (sec)
  ...
  Processed 1000000 objects. time= 7.06745 (sec)
  Index creation time=63.3504 (sec) 63350.4 (msec)

Create and initialize the quantized graph:

  $ qbg create-qg anng
  creating...
  appending...

Build the quantized graph:

  $ qbg build-qg anng
  optimizing...
  building the inverted index...
  building the quantized graph... 

Search with the quantized graph

Search k nearest neighbors with the quantized graph:

  $ qbg search-qg -n 20 -e 0.02 anng query.tsv
  Query No.1
  Rank	ID	Distance
  1	932086	232.871
  2	934877	234.715
  3	561814	243.99
  ...
  20	2177	276.781
  Query Time= 0.0005034 (sec), 0.5034 (msec)
  Average Query Time= 0.0005034 (sec), 0.5034 (msec), (0.0005034/1)

Examples of building the quantized graph for higher performance

Build an ANNG having more edges for higher performance:

  $ ngt create -d 128 -o f -D 2 -E 40 anng-40 sift-128-euclidean.tsv

Build an ONNG:

  $ ngt reconstruct-graph -m S -E 64 -o 64 -i 120 anng-40 onng-40

Create and initialize the quantized graph:

  $ qbg create-qg onng-40

Build the quantized graph:

  $ qbg build-qg onng-40

Search k nearest neighbors with the quantized graph:

  $ qbg search -n 20 -e 0.02 onng-40 query.tsv

Examples of using the quantized blob graph

Setup data

  $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv
  $ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean_query.tsv
  $ head -1 sift-128-euclidean_query.tsv > query.tsv

Build the quantized blob graph

Create and initialize the quantized blob graph:

  $ qbg create -d 128 -D 2 -N 128 qbg-index

Append objects:

  $ qbg append qbg-index sift-128-euclidean.tsv

Build the quantized graph:

  $ qbg build qbg-index

Search with the quantized blob graph

Search k nearest neighbors with the quantized blob graph:

  $ qbg search -n 20 -e 0.02 qbg-index query.tsv