ColBERT v0.1.0
Merged pull requests:
- Adding documentation for the indexing components. (#3) (@codetalker7)
- Add the
Indexer
type and a correspondingindex
function to build the index. (#4) (@codetalker7) - Adding more information in the README. (#5) (@codetalker7)
- The
Searcher
component. (#14) (@codetalker7) - Adding the
QueryTokenizer
. (#15) (@codetalker7) - Generating query embeddings. (#17) (@codetalker7)
- Fixing indexing code to include
QueryTokenizer
in theCheckpoint
. (#18) (@codetalker7) - Exporting the config. (#19) (@codetalker7)
- Explicitly qualify
JLD2
function calls. (#20) (@codetalker7) - Remove unnecessary imports. (#21) (@codetalker7)
- Adding GPU support. (#22) (@codetalker7)
- Enforce smaller types and add type checks throughout. (#24) (@codetalker7)
- Formatting the codebase. (#25) (@codetalker7)
- More formatting + some options. (#26) (@codetalker7)
- Many design changes + optimizations. (#27) (@codetalker7)
- README example. (#28) (@codetalker7)
- Adding
kmeans
implementation with GPU acceleration + minor changes. (#29) (@codetalker7) - Local loading of HF checkpoints. (#30) (@codetalker7)
- Unit tests + more design changes + function level optimizations. (#32) (@codetalker7)
- Adding and fixing some tests + updating compat helper. (#34) (@codetalker7)
- Adding index + query examples for Julia documentation, and some evals. (#35) (@codetalker7)
- Fixing badges + repo url. (#36) (@codetalker7)
Closed issues:
- Add unit testing. (#6)
- Add GPU support to the indexer. (#8)
- Let users specify number of chunks in the config. (#9)
- Export config to disk. (#10)
- Check uniquness of PID and QID for documents and collections respectively. (#12)
- Truncate sequences to
doc_maxlen
inDocTokenizer
. (#16) - Specify smaller types for storage and inference efficiency. (#23)
- Normalization code is incorrect. (#31)