Release v0.1.0 · JuliaGenAI/ColBERT.jl

ColBERT v0.1.0

Merged pull requests:

Adding documentation for the indexing components. (#3) (@codetalker7)
Add the Indexer type and a corresponding index function to build the index. (#4) (@codetalker7)
Adding more information in the README. (#5) (@codetalker7)
The Searcher component. (#14) (@codetalker7)
Adding the QueryTokenizer. (#15) (@codetalker7)
Generating query embeddings. (#17) (@codetalker7)
Fixing indexing code to include QueryTokenizer in the Checkpoint. (#18) (@codetalker7)
Exporting the config. (#19) (@codetalker7)
Explicitly qualify JLD2 function calls. (#20) (@codetalker7)
Remove unnecessary imports. (#21) (@codetalker7)
Adding GPU support. (#22) (@codetalker7)
Enforce smaller types and add type checks throughout. (#24) (@codetalker7)
Formatting the codebase. (#25) (@codetalker7)
More formatting + some options. (#26) (@codetalker7)
Many design changes + optimizations. (#27) (@codetalker7)
README example. (#28) (@codetalker7)
Adding kmeans implementation with GPU acceleration + minor changes. (#29) (@codetalker7)
Local loading of HF checkpoints. (#30) (@codetalker7)
Unit tests + more design changes + function level optimizations. (#32) (@codetalker7)
Adding and fixing some tests + updating compat helper. (#34) (@codetalker7)
Adding index + query examples for Julia documentation, and some evals. (#35) (@codetalker7)
Fixing badges + repo url. (#36) (@codetalker7)

Closed issues:

Add unit testing. (#6)
Add GPU support to the indexer. (#8)
Let users specify number of chunks in the config. (#9)
Export config to disk. (#10)
Check uniquness of PID and QID for documents and collections respectively. (#12)
Truncate sequences to doc_maxlen in DocTokenizer. (#16)
Specify smaller types for storage and inference efficiency. (#23)
Normalization code is incorrect. (#31)