Skip to content

v0.1.6 - Metadata columns, Partition keys, and Auxiliary column support

Latest
Compare
Choose a tag to compare
@asg017 asg017 released this 20 Nov 16:38
· 2 commits to main since this release

sqlite-vec now supports metadata columns in vec0 virtual tables! Check out the announcement blogpost (Nov 2024) for more information.

You can now declare metadata columns, partition keys, and auxliary columns in a vec0 virtual table:

create virtual table vec_articles using vec0(

  article_id integer primary key,

  -- Vector text embedding of the `headline` column, with 384 dimensions
  headline_embedding float[384],

  -- Partition key, internally shard vector index on article published year
  year integer partition key,

  -- Metadata columns, can appear in `WHERE` clause of KNN queries
  news_desk text,
  word_count integer,
  pub_date text,

  -- Auxiliary columns, unindexed but fast lookups
  +headline text,
  +url text
);

And perform KNN queries with extra WHERE clauses:

select
  article_id,
  headline,
  news_desk,
  word_count,
  url,
  pub_date,
  distance
from vec_articles
where headline_embedding match lembed('pandemic')
  and k = 8
  and year = 2020
  and news_desk in ('Sports', 'Business')
  and word_count between 500 and 1000;
┌────────────┬──────────────────────────────────────────────────────────────────────┬───────────┬────────────┬─────────────────────────────┬──────────────────────────┬───────────┐
│ article_id │ headline                                                             │ news_desk │ word_count │ url                         │ pub_date                 │ distance  │
├────────────┼──────────────────────────────────────────────────────────────────────┼───────────┼────────────┼─────────────────────────────┼──────────────────────────┼───────────┤
│    2911716 │ The Pandemic’s Economic Damage Is Growing                            │ Business  │        910 │ https://www.nytimes.com/... │ 2020-07-07T18:12:40+0000 │ 0.8928120 │
│    2892929 │ As Coronavirus Spreads, Olympics Face Ticking Clock and a Tough Call │ Sports    │        987 │ https://www.nytimes.com/... │ 2020-03-06T01:34:36+0000 │ 0.9608180 │
│    2932041 │ The Pandemic Is Already Affecting Next Year’s Sports Schedule        │ Sports    │        620 │ https://www.nytimes.com/... │ 2020-11-11T13:56:25+0000 │ 0.9802038 │
│    2915381 │ The Week in Business: Getting Rich Off the Pandemic                  │ Business  │        814 │ https://www.nytimes.com/... │ 2020-08-02T11:00:03+0000 │ 1.0064692 │
│    2896043 │ The Coronavirus and the Postponement of the Olympics, Explained      │ Sports    │        798 │ https://www.nytimes.com/... │ 2020-03-25T17:45:58+0000 │ 1.0115833 │
│    2898566 │ Robots Welcome to Take Over, as Pandemic Accelerates Automation      │ Business  │        871 │ https://www.nytimes.com/... │ 2020-04-10T09:00:27+0000 │  1.019637 │
│    2898239 │ The Pandemic Feeds Tech Companies’ Power                             │ Business  │        784 │ https://www.nytimes.com/... │ 2020-04-08T16:43:13+0000 │ 1.0200014 │
│    2929224 │ In M.L.S., the Pandemic Changes the Playoff Math                     │ Sports    │        859 │ https://www.nytimes.com/... │ 2020-10-29T17:09:10+0000 │ 1.0238885 │
└────────────┴──────────────────────────────────────────────────────────────────────┴───────────┴────────────┴─────────────────────────────┴──────────────────────────┴───────────┘

Consult the sqlite-vec vec0 documentation for additional info.