Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: In IndexUpdater, cast pids to Int before saving #180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AshwinParanjape
Copy link

I get an error while using IndexUpdater. This PR fixes it.

Here's the minimal example script

import os
from pathlib import Path

from colbert import Indexer, Searcher, IndexUpdater
from colbert.infra import ColBERTConfig, Run, RunConfig

old_books = [
    "The Adventures of Sherlock Holmes",
    "The Adventures of Tom Sawyer",
    "The Adventures of Huckleberry Finn",
]

modern_books = [
    "The song of ice and fire"*10,
    "Harry Potter and the Sorcerer's Stone"*10,
    "The Hunger Games"*10,
    "The Fellowship of the Ring"*10,
    "The Two Towers",
    "The Return of the King",
    "The Hobbit",
    "The Lord of the Rings",
    "The Hitchhiker's Guide to the Galaxy",
    "Seventy-Seven Clocks",
    "I, Robot",
    "Life, the Universe and Everything",
]

new_index = "tiny_index"

# Download the colbert model to this path
colbert_checkpoint = "colbertv2.0/"
if __name__ == "__main__":
    with Run().context(
        RunConfig(
            nranks=1,
            root=new_index,
            experiment="",
        )
    ):
        config = ColBERTConfig(nbits=2, root=new_index, doc_maxlen=100)
        indexer = Indexer(checkpoint=colbert_checkpoint, config=config)

        indexer.index(
            name="msmarco.nbits=2",
            collection=modern_books
        )

        searcher = Searcher(index="msmarco.nbits=2", config=config)
        config = ColBERTConfig(nbits=2, root=new_index, doc_maxlen=100)
        index_updater = IndexUpdater(
            config=config, searcher=searcher, checkpoint=colbert_checkpoint
        )

        index_updater.add(old_books)
        index_updater.persist_to_disk()
        results = searcher.search("Sherlock", 3)
        print(results)

Leads to the following error

Traceback (most recent call last):
  File "test_incremental_build_minimal.py", line 79, in <module>
    results = searcher.search("Sherlock", 3)
  File ".../ColBERT/colbert/searcher.py", line 61, in search
    return self.dense_search(Q, k, filter_fn=filter_fn)
  File ".../ColBERT/colbert/searcher.py", line 108, in dense_search
    pids, scores = self.ranker.rank(self.config, Q, filter_fn=filter_fn)
  File ".../ColBERT/colbert/search/index_storage.py", line 84, in rank
    scores, pids = self.score_pids(config, Q, pids, centroid_scores)
  File ".../ColBERT/colbert/search/index_storage.py", line 142, in score_pids
    pids = IndexScorer.filter_pids(
RuntimeError: expected scalar type Int but found Long

This PR fixes this issue, resulting in the following output

([10, 6, 13], [1, 2, 3], [9.62167739868164, 8.728744506835938, 8.65077018737793])

@sarisel
Copy link

sarisel commented Mar 10, 2023

I think this error only happens when running on CPU.

@AshwinParanjape
Copy link
Author

That may be true, I was testing it on a CPU machine

stillmatic added a commit to stillmatic/ColBERT that referenced this pull request May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants