bench: Comparison with pgvectorscale #125

gaocegege · 2024-12-06T03:18:39Z

No description provided.

cutecutecat · 2024-12-12T02:45:49Z

Comparison with pgvectorscale

Dataset: laion-5m-768dim
Argument: default argument from pgvectorscale Readme

Advantages of pgvectorscale

✅ Double the capacity

To store the whole dataset, pgvectorscale cost 17G while VectorChord cost 34G on disk, this might be related to:

num_bits_per_dimension: Number of bits used to encode each dimension when using SBQ, 2 for less than 900 dimensions, 1 otherwise

Our scaler8 type might solve it.

✅ Fantastic cold start

VectorChord and pgvectorscale have a similar query speed at warm state.

For VectorChord, cold start is much slower than warm state, from QPS 29 to 201 at recall 0.95, about 7x accerate. For that reason, prewarm is really important for us.

However, we observed only about 2x accerate from pgvectorscale cold to warm. We can say it doesn't need prewarm at all.

Disadvantages of pgvectorscale

❌ Slower index build speed

VectorChord external build: 1240s on 4 cores
VectorChord internal build: 9239s on 4 cores
pgvectorscale build: 11540s on 1core, unable to use multi cores

❌ Not better performance

With the default argument and both dot/L2 metric on our dataset, we can not find a recall > 0.8 by configure query-time parameters: diskann.query_search_list_size

Even the dataset is dot-based, an dot metric is even worse than l2 metric.

L2 metric:

	top 10 cold	top 10 warm	top 100 cold	top 100 warm
Recall	0.7744	0.7744	0.6838	0.6838
QPS	159.72	242.71	73.54	148.10
P50 latency	5.13ms	4.04ms	12.12ms	6.64ms
P99 latency	19.15ms	7.30ms	24.91ms	10.93ms

Dot metric:

	top 10 cold	top 10 warm	top 100 cold	top 100 warm
Recall	0.6569	0.6569	0.6723	0.6723
QPS	245.06	265.28	73.58	145.48
P50 latency	3.93ms	3.75ms	13.52ms	6.82ms
P99 latency	9.89ms	6.25ms	23.74ms	10.77ms

For VectorChord, a typical recall standard is 0.95, the whole result can be found at #42 (comment) .

Update: Dot metric with more rerank(default = 500):

While change only diskann.query_search_list_size is useless, increase diskann.query_rescore is more helpful.

For default value, diskann.query_search_list_size=100 and diskann.query_rescore=50, we say rerank=300 means to set diskann.query_search_list_size=diskann.query_rescore=300.

	Recall	QPS	P50 latency	P99 latency
top 10 rerank=200	0.9471	144.79	6.69ms	12.82ms
top 10 rerank=250	0.9611	117.41	8.10ms	17.00ms
top 10 rerank=250 cold	/	61.03	11.87ms	84.18ms
top 100 rerank=300	0.9088	67.17	14.13ms	29.73ms
top 100 rerank=350	0.9402	54.93	16.90ms	41.26ms
top 100 rerank=400	0.9601	37.06	24.30ms	71.01ms
top 100 rerank=400 cold	/	26.16	30.71ms	116.25ms

xieydd · 2024-12-12T03:32:55Z

What is the memory usage after prewarm?

VoVAllen · 2024-12-12T05:16:20Z

Without the recall performance, the speed is useless.

VoVAllen · 2024-12-12T05:18:31Z

Does more rerank help in pgvectorscale?

cutecutecat · 2024-12-13T01:29:21Z

Does more rerank help in pgvectorscale?

It is much useful, I have updated the new result.

What is the memory usage after prewarm?

About 8G for top10 and 11G for top100

gaocegege added area/test 🧪 area/db labels Dec 6, 2024

cutecutecat self-assigned this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: Comparison with pgvectorscale #125

bench: Comparison with pgvectorscale #125

gaocegege commented Dec 6, 2024

cutecutecat commented Dec 12, 2024 •

edited

Loading

xieydd commented Dec 12, 2024

VoVAllen commented Dec 12, 2024

VoVAllen commented Dec 12, 2024

cutecutecat commented Dec 13, 2024 •

edited

Loading

bench: Comparison with pgvectorscale #125

bench: Comparison with pgvectorscale #125

Comments

gaocegege commented Dec 6, 2024

cutecutecat commented Dec 12, 2024 • edited Loading

Comparison with pgvectorscale

Advantages of pgvectorscale

Disadvantages of pgvectorscale

xieydd commented Dec 12, 2024

VoVAllen commented Dec 12, 2024

VoVAllen commented Dec 12, 2024

cutecutecat commented Dec 13, 2024 • edited Loading

cutecutecat commented Dec 12, 2024 •

edited

Loading

cutecutecat commented Dec 13, 2024 •

edited

Loading