You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting error :-
File "abcd./anaconda3/lib/python3.7/site-packages/fonduer/utils/data_model_utils/structural.py", line 55, in _get_node
return doc_etree.xpath(sentence.xpath)[0]
IndexError: list index out of range
I am following Hardware tutorial on some Email HTML msgs and getting mentions count near 4000
@AshutoshUpadhya doc_etree.xpath(sentence.xpath)[0] should return HtmlElement that is corresponding to sentence.xpath.
You mentioned that "no candidates being generated" but the function _get_node will not be visited if you have no candidate.
This could be a bug at Fonduer, but I'm not sure at the moment. Please help me figure that out.
Can you check the followings?
train_cands = candidate_extractor.get_candidates(split=0)
print(len(train_cands))
for cands in train_cands:
print(len(cands))
Thanks @HiromuHota for response.. I figured that out. candidates were generating .. but its failing due to type of data I have.
Can't fonduer parse strings written as 15th (superposed 'th' in small like we write in dates)? Its failing there.
Can you confirm and suggest some solution.
When i removed th (th ) from the html , the step " featurizer.apply(split=0, train=True, parallelism=PARALLEL)"
ran through. But i need to pass "15th" as it is.
Image of sample file:
@senwu @HiromuHota .. can you pls suggest if my analogy is right?
I am getting error :-
File "abcd./anaconda3/lib/python3.7/site-packages/fonduer/utils/data_model_utils/structural.py", line 55, in _get_node
return doc_etree.xpath(sentence.xpath)[0]
IndexError: list index out of range
I am following Hardware tutorial on some Email HTML msgs and getting mentions count near 4000
Also :--
train_cands = candidate_extractor.get_candidates(split=0)
dev_cands = candidate_extractor.get_candidates(split=1)
test_cands = candidate_extractor.get_candidates(split=2)
Above steps returned outputs but,
on applying featurizer:
featurizer.apply(split=0, train=True, parallelism=PARALLEL)
I am getting error mentioned on top.
I looked stackoverflow but the reason that HTML syntax issue,.. is not there as it is rendering good on browser.
So can you share your thoughts on :
Thanks.
The text was updated successfully, but these errors were encountered: