You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to Train a model that can build a Knowledge Base from the OPC UA Companions specification as a part of my Thesis.
I have the Dataset as PDFs and used a third-party program to convert them into HTML and tried my best to preserve the data structure information (i'm getting the same result even if i just Parsed on the PDFs alone).
Then i followed the hardware_fonduer_model Tutorial to Extract the Candidates accordingly.
the Problem is that the Parser is splitting the sentences wrongly, namely it is getting the end of a Line as an end of a sentence.
I tried to debug using a SimpleParser.split_sentences(text) command and turned out that python needs a backslash to split a statement into multiple lines.
So i thought maybe i could use the replacements=['[\n]', ' '] parameter so the Split could function better but i'm getting the ValueError: too many values to unpack (expected 2).
What is the default configuration for the sentence segmentation?
How could i get a multiple Sentences as a mention? (i tried MentionNgram till n_max =100 and still getting just one).
I would really appreciate getting back from you.
many thanks in advance
Example: Text to be parsed
Boolean indicating if a profile /signature should be generated by this move command
request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Expected behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command request.
sentence 2 : If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Actual behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command
sentence 2 : request.
sentence 3 : request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
sentence 4 : parameter is ignored by the Server.
The text was updated successfully, but these errors were encountered:
lukehsiao
changed the title
Parser is not spliting multiple lines sentences propably
Parser is not spliting multiple lines sentences properly
Oct 19, 2020
lukehsiao
changed the title
Parser is not spliting multiple lines sentences properly
Parser is not splitting multiple lines sentences properly
Oct 19, 2020
replacements accepts List[Tuple[str, str]], so your case should be replacements=[('\n', ' ')].
Put a whitespace in "request.If" to make it "request. If". Fonduer relies on spaCy for sentence segmentation if SpacyParser is used. Please directly use spaCy to check how your text is split into sentences.
i tried both solutions but i'm still getting the same result.
can i configurate the spacyParser further? could it be from the PDFs themself ?
i would be very Gratefull if you could help me with this.
I thought maybe if i could get the whole paragraph as a mention there is no need to split the sentences right.
So i tried to use Paragrephmention class but i'm getting the Error:
AttributeError: 'str' object has no attribute 'get_stable_id'
what dose the Paragraphmention takes as input?
thanks a lot
Description of the bug
I'm trying to Train a model that can build a Knowledge Base from the OPC UA Companions specification as a part of my Thesis.
I have the Dataset as PDFs and used a third-party program to convert them into HTML and tried my best to preserve the data structure information (i'm getting the same result even if i just Parsed on the PDFs alone).
Then i followed the hardware_fonduer_model Tutorial to Extract the Candidates accordingly.
the Problem is that the Parser is splitting the sentences wrongly, namely it is getting the end of a Line as an end of a sentence.
I tried to debug using a SimpleParser.split_sentences(text) command and turned out that python needs a backslash to split a statement into multiple lines.
So i thought maybe i could use the replacements=['[\n]', ' '] parameter so the Split could function better but i'm getting the ValueError: too many values to unpack (expected 2).
What is the default configuration for the sentence segmentation?
How could i get a multiple Sentences as a mention? (i tried MentionNgram till n_max =100 and still getting just one).
I would really appreciate getting back from you.
many thanks in advance
Example: Text to be parsed
Boolean indicating if a profile /signature should be generated by this move command
request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Expected behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command request.
sentence 2 : If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Actual behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command
sentence 2 : request.
sentence 3 : request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
sentence 4 : parameter is ignored by the Server.
Environment
MDISCompanionSpecification.pdf
The text was updated successfully, but these errors were encountered: