IndexError: index 4 is out of bounds for axis 0 with size 4 #643
-
Error
This Error is becoming to common. A permanent solution is needed 👍 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
yes. experienced it too. i will be on a lookout for a permanent solution |
Beta Was this translation helpful? Give feedback.
-
please share a snippet of the code executed where this appeared, so we all have something reproducible to discuss |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
There are two reasons why this issue showed up: Adding more doc inside a vectorstore with new vocab
Potential solution
# Check if the model already has a vocabulary built
if len(self._model.wv) == 0:
self._model.build_vocab(
tagged_data
) # Build vocabulary if not already built
else:
self._model.build_vocab(
tagged_data, update=True
) # Update the vocabulary if it exists Using OOV during retrievalUsing Out Of Vocabluary (OOV) words during retrieval also causes this errors. Potential solutionCheck if the word is already a known vocab before retrieval. def infer_vector(self, data: str) -> Vector:
words = data.split()
# Check if words are known to the model's vocabulary
known_words = [word for word in words if word in self._model.wv]
if not known_words:
# Return a zero-vector if all words are OOV
vector = [0.0] * self._model.vector_size
else:
# Infer vector from known words
vector = self._model.infer_vector(known_words)
return Vector(value=vector) |
Beta Was this translation helpful? Give feedback.
There are two reasons why this issue showed up:
Adding more doc inside a vectorstore with new vocab
Doc2VecVectorStore
, when the vector store was initialized and added documents, it does it with no errors, however if you add more documents after that have new vocabs in the vector store again it causes theIndexError
Potential solution
genism
'sbuild_vocab
method has the parameterupdate=
which is usuallyFalse
as the default though if changed toTrue
after build it fixes the problem. Below is an example