sqlite-vec
now supports metadata columns in vec0
virtual tables! Check out the announcement blogpost (Nov 2024) for more information.
You can now declare metadata columns, partition keys, and auxliary columns in a vec0 virtual table:
create virtual table vec_articles using vec0(
article_id integer primary key,
-- Vector text embedding of the `headline` column, with 384 dimensions
headline_embedding float[384],
-- Partition key, internally shard vector index on article published year
year integer partition key,
-- Metadata columns, can appear in `WHERE` clause of KNN queries
news_desk text,
word_count integer,
pub_date text,
-- Auxiliary columns, unindexed but fast lookups
+headline text,
+url text
);
And perform KNN queries with extra WHERE
clauses:
select
article_id,
headline,
news_desk,
word_count,
url,
pub_date,
distance
from vec_articles
where headline_embedding match lembed('pandemic')
and k = 8
and year = 2020
and news_desk in ('Sports', 'Business')
and word_count between 500 and 1000;
┌────────────┬──────────────────────────────────────────────────────────────────────┬───────────┬────────────┬─────────────────────────────┬──────────────────────────┬───────────┐
│ article_id │ headline │ news_desk │ word_count │ url │ pub_date │ distance │
├────────────┼──────────────────────────────────────────────────────────────────────┼───────────┼────────────┼─────────────────────────────┼──────────────────────────┼───────────┤
│ 2911716 │ The Pandemic’s Economic Damage Is Growing │ Business │ 910 │ https://www.nytimes.com/... │ 2020-07-07T18:12:40+0000 │ 0.8928120 │
│ 2892929 │ As Coronavirus Spreads, Olympics Face Ticking Clock and a Tough Call │ Sports │ 987 │ https://www.nytimes.com/... │ 2020-03-06T01:34:36+0000 │ 0.9608180 │
│ 2932041 │ The Pandemic Is Already Affecting Next Year’s Sports Schedule │ Sports │ 620 │ https://www.nytimes.com/... │ 2020-11-11T13:56:25+0000 │ 0.9802038 │
│ 2915381 │ The Week in Business: Getting Rich Off the Pandemic │ Business │ 814 │ https://www.nytimes.com/... │ 2020-08-02T11:00:03+0000 │ 1.0064692 │
│ 2896043 │ The Coronavirus and the Postponement of the Olympics, Explained │ Sports │ 798 │ https://www.nytimes.com/... │ 2020-03-25T17:45:58+0000 │ 1.0115833 │
│ 2898566 │ Robots Welcome to Take Over, as Pandemic Accelerates Automation │ Business │ 871 │ https://www.nytimes.com/... │ 2020-04-10T09:00:27+0000 │ 1.019637 │
│ 2898239 │ The Pandemic Feeds Tech Companies’ Power │ Business │ 784 │ https://www.nytimes.com/... │ 2020-04-08T16:43:13+0000 │ 1.0200014 │
│ 2929224 │ In M.L.S., the Pandemic Changes the Playoff Math │ Sports │ 859 │ https://www.nytimes.com/... │ 2020-10-29T17:09:10+0000 │ 1.0238885 │
└────────────┴──────────────────────────────────────────────────────────────────────┴───────────┴────────────┴─────────────────────────────┴──────────────────────────┴───────────┘
Consult the sqlite-vec
vec0
documentation for additional info.