englishPCFG.ser.gz change in files #1363

Shasetty · 2023-06-10T08:48:15Z

Shasetty
Jun 10, 2023

Hi Sir/Madam.
Shankar from Bangalore.
I found, that the file englishPCFG.ser.gz, in version 4.5.3 & 4.5.4 was saved on 3 Nov 2020.
Whereas other version from 4.2.2 till 4.5.2 has file englishPCFG.ser.gz, saved on 14 May 2021.

1 issue what I found, is punctuation issue. (text pasted below)
Text:
If an unforeseen event occurs or business conditions change, we may use the proceeds of this offering differently than as described in this prospectus. See “Risk Factors.”

Can you please inform, if there are any other issues?

AngledLuffa · 2023-06-10T16:22:10Z

AngledLuffa
Jun 10, 2023
Maintainer

Great observation. Actually, they are the same file according to md5sum. I think probably they got copied someplace without the -p flag when being transcribed

…

On Sat, Jun 10, 2023 at 1:48 AM Shasetty ***@***.***> wrote: Hi Sir/Madam. Shankar from Bangalore. I found, that the file englishPCFG.ser.gz, in version 4.5.3 & 4.5.4 was saved on 3 Nov 2020. Whereas other version from 4.2.2 till 4.5.2 has file englishPCFG.ser.gz, saved on 14 May 2021. 1. 1 issue what I found, is punctuation issue. (text pasted below) Text: If an unforeseen event occurs or business conditions change, we may use the proceeds of this offering differently than as described in this prospectus. See “Risk Factors.” ------------------------------ Can you please inform, if there are any other issues? — Reply to this email directly, view it on GitHub <#1363>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWIKS74TF7HW2JQU64LXKQYGDANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

AngledLuffa · 2023-06-10T16:23:47Z

AngledLuffa
Jun 10, 2023
Maintainer

There is a shift reduce parser which is more accurate in CoreNLP, and Stanza's constituency parser is significantly more accurate. If you're specifically trying to use the PCFG, we can hopefully provide you an upgraded one using a silver dataset we'd been working on as part of the Stanza project.

…

On Sat, Jun 10, 2023 at 9:21 AM John Bauer ***@***.***> wrote: Great observation. Actually, they are the same file according to md5sum. I think probably they got copied someplace without the -p flag when being transcribed On Sat, Jun 10, 2023 at 1:48 AM Shasetty ***@***.***> wrote: > Hi Sir/Madam. > Shankar from Bangalore. > I found, that the file englishPCFG.ser.gz, in version 4.5.3 & 4.5.4 was > saved on 3 Nov 2020. > Whereas other version from 4.2.2 till 4.5.2 has file englishPCFG.ser.gz, > saved on 14 May 2021. > > 1. 1 issue what I found, is punctuation issue. (text pasted below) > Text: > If an unforeseen event occurs or business conditions change, we may > use the proceeds of this offering differently than as described in this > prospectus. See “Risk Factors.” > > ------------------------------ > > Can you please inform, if there are any other issues? > > — > Reply to this email directly, view it on GitHub > <#1363>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AA2AYWIKS74TF7HW2JQU64LXKQYGDANCNFSM6AAAAAAZBRRUPI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

1 reply

Shasetty Jun 11, 2023
Author

As i am using "englishPCFG.ser.gz", i would be happy , if you provide me the upgraded PCFG.
EMail id is [email protected]

AngledLuffa · 2023-06-12T07:21:32Z

AngledLuffa
Jun 12, 2023
Maintainer

Well... the dev set performance goes from 85.45 to 85.74. I'll see if fiddling around with the weight a little improves that result, but I'm not too hopeful. In fact, one would wonder if this is a sign that the English silver dataset I built is not that great. If it were any good, I would have expected it to be a big help to an older style model like the PCFG. If you plan to continue using Java, you might consider using the shift reduce parser, as its accuracy is 90 or a little higher, and it's also a lot faster. There's also the Stanza constituency parser, where we have a model which gets 96, but I get the impression you want to stick with Java

…

On Sun, Jun 11, 2023 at 12:38 AM Shasetty ***@***.***> wrote: As i am using "englishPCFG.ser.gz", i would be happy , if you provide me the upgraded PCFG. EMail id is ***@***.*** — Reply to this email directly, view it on GitHub <#1363 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWILZTKXDLQMI7B7HWTXKVYYVANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Shasetty · 2023-06-12T10:43:07Z

Shasetty
Jun 12, 2023
Author

Thank you for, informing the Genuine status.

As you are the right person with good skills on handling the empty space in the programme of PCFG.

I will be waiting for your inputs.

Stanza Constituency Parser

As I am working on Stanford dependency parser.

if I use Dependency parsing of Stanza, I will only get Universal Dependencies output. (https://stanfordnlp.github.io/stanza/depparse.html)

Are there any possibilities to get basic Stanford dependency output using Stanza?

If yes, please guide me.

Shift-Reduce Constituency Parser
(https://nlp.stanford.edu/software/srparser.html)

It has old version, dated: stanford-srparser-2014-10-23-models, are there any recently updated version.

if yes can you provide me.

0 replies

AngledLuffa · 2023-06-14T03:18:37Z

AngledLuffa
Jun 14, 2023
Maintainer

I trained an updated SRParser a couple years ago - it's available here https://search.maven.org/remotecontent?filepath=edu/stanford/nlp/stanford-corenlp/4.4.0/stanford-corenlp-4.4.0-models-english.jar I don't think there's a way to get Stanford dependencies instead of Universal from Stanza, unfortunately.

…

On Mon, Jun 12, 2023 at 3:43 AM Shasetty ***@***.***> wrote: Thank you for, informing the Genuine status. As you are the right person with good skills on handling the empty space in the programme of PCFG. I will be waiting for your inputs. ------------------------------ *Stanza Constituency Parser* As I am working on Stanford dependency parser. if I use Dependency parsing of Stanza, I will only get Universal Dependencies <https://universaldependencies.org/> output. ( https://stanfordnlp.github.io/stanza/depparse.html) Are there any possibilities to get basic Stanford dependency output using Stanza? If yes, please guide me. ------------------------------ Shift-Reduce Constituency Parser (https://nlp.stanford.edu/software/srparser.html) It has old version, dated: stanford-srparser-2014-10-23-models, are there any recently updated version. if yes can you provide me. — Reply to this email directly, view it on GitHub <#1363 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWN6CVGZVBEIZPBUTITXK3XENANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Shasetty · 2023-06-14T17:48:49Z

Shasetty
Jun 14, 2023
Author

We tried to run SRparser , we were not able to get the out put.

Followed method as per : https://nlp.stanford.edu/software/srparser.html (Calling Parsing from Java)

As this file was not available (stanford-postagger-3.5.0.jar) we downloaded http://www.java2s.com/Code/Jar/s/Downloadstanfordpostaggerjar.htm

We are getting Dependency Parse (enhanced plus plus dependencies) : we don’t want this.

We want basic dependency (as per Stanford).

Please provide the command line to get basic dependency (as per Stanford)

0 replies

AngledLuffa · 2023-06-14T21:36:36Z

AngledLuffa
Jun 14, 2023
Maintainer

Things are going to be a bit hectic here for another week, but I'm happy to help how I can. Would you remind me the command line you are using to get the SD in the first place? And, ideally, the command line you tried for the SRParser

…

On Wed, Jun 14, 2023 at 10:48 AM Shasetty ***@***.***> wrote: We tried to run SRparser , we were not able to get the out put. Followed method as per : https://nlp.stanford.edu/software/srparser.html (Calling Parsing from Java) As this file was not available (stanford-postagger-3.5.0.jar) we downloaded http://www.java2s.com/Code/Jar/s/Downloadstanfordpostaggerjar.htm We are getting Dependency Parse (enhanced plus plus dependencies) : we don’t want this. We want basic dependency (as per Stanford). Please provide the command line to get basic dependency (as per Stanford) — Reply to this email directly, view it on GitHub <#1363 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWLH34XH4S7UGBDY323XLH2QXANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Shasetty · 2023-06-15T00:12:41Z

Shasetty
Jun 15, 2023
Author

-------------------------------- edu.stanford.nlp.parser.shiftreduce.ShiftReduceParser (Must specify a treebank to train from with -trainTreebank or a parser to load with -serializedPath)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.shiftreduce.ShiftReduceParser -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu/stanford/nlp/models/srparser/englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

-------------------------------- edu.stanford.nlp.parser.nndep.DependencyParser (not working)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.nndep.DependencyParser -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

-------------------------------- edu.stanford.nlp.parser.lexparser.LexicalizedParser (not working)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

----------------------------- edu.stanford.nlp.pipeline.StanfordCoreNLP(working) Dependency Parse (enhanced plus plus dependencies):

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

I want out put of "basic dependency"

0 replies

Shasetty · 2023-06-15T00:15:45Z

Shasetty
Jun 15, 2023
Author

working codes of other parsers

-------------------- original dependency(englishPCFG.ser.gz)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -retainTmpSubcategories -originalDependencies -outputFormat "typedDependencies" -outputFormatOptions "basicDependencies" G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz G:\shiftreduce-corenlp-4.4.0\file.txt

-------------------- Ud dependency(englishPCFG.ser.gz)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -retainTmpSubcategories -outputFormat "typedDependencies" -outputFormatOptions "basicDependencies" G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz G:\shiftreduce-corenlp-4.4.0\file.txt

-------------------- nndep(english_SD.gz)

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.parser.nndep.DependencyParser -model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models\edu\stanford\nlp\models\parser\nndep\english_SD.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

-------------------- sr(StanfordCoreNLP) working Dependency Parse (enhanced plus plus dependencies):

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

0 replies

AngledLuffa · 2023-06-15T16:08:11Z

AngledLuffa
Jun 15, 2023
Maintainer

Two things -

for CoreNLP to use the SRParser, you'll need

-annotators "tokenize,pos,parse"

currently it's loading the neural dependency parser in CoreNLP, which doesn't do Stanford dependencies AFAIK

To switch back to Stanford dependencies, you can do this

-parse.originalDependencies true

so here's what I ran and what I got:

> cat foo.txt
Jennifer has lovely blue antennae

> java edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model edu/stanford/nlp/models/srparser/englishSR.ser.gz  -textFile foo.txt -annotators "tokenize,pos,parse"
Document: ID=foo.txt (1 sentences, 5 tokens)

Sentence #1 (5 tokens):
Jennifer has lovely blue antennae

Tokens:
[Text=Jennifer CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP]
[Text=has CharacterOffsetBegin=9 CharacterOffsetEnd=12 PartOfSpeech=VBZ]
[Text=lovely CharacterOffsetBegin=13 CharacterOffsetEnd=19 PartOfSpeech=JJ]
[Text=blue CharacterOffsetBegin=20 CharacterOffsetEnd=24 PartOfSpeech=JJ]
[Text=antennae CharacterOffsetBegin=25 CharacterOffsetEnd=33 PartOfSpeech=NN]

Constituency parse:
(ROOT
  (S
    (NP (NNP Jennifer))
    (VP (VBZ has)
      (NP (JJ lovely) (JJ blue) (NN antennae)))))


Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, has-2)
nsubj(has-2, Jennifer-1)
amod(antennae-5, lovely-3)
amod(antennae-5, blue-4)
obj(has-2, antennae-5)                   <---- UD, I believe, since this is an obj


> java edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model edu/stanford/nlp/models/srparser/englishSR.ser.gz  -textFile foo.txt -annotators "tokenize,pos,parse" -parse.originalDependencies true
> cat foo.txt.out

Document: ID=foo.txt (1 sentences, 5 tokens)

Sentence #1 (5 tokens):
Jennifer has lovely blue antennae

Tokens:
[Text=Jennifer CharacterOffsetBegin=0 CharacterOffsetEnd=8 PartOfSpeech=NNP]
[Text=has CharacterOffsetBegin=9 CharacterOffsetEnd=12 PartOfSpeech=VBZ]
[Text=lovely CharacterOffsetBegin=13 CharacterOffsetEnd=19 PartOfSpeech=JJ]
[Text=blue CharacterOffsetBegin=20 CharacterOffsetEnd=24 PartOfSpeech=JJ]
[Text=antennae CharacterOffsetBegin=25 CharacterOffsetEnd=33 PartOfSpeech=NN]

Constituency parse:
(ROOT
  (S
    (NP (NNP Jennifer))
    (VP (VBZ has)
      (NP (JJ lovely) (JJ blue) (NN antennae)))))


Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, has-2)
nsubj(has-2, Jennifer-1)
amod(antennae-5, lovely-3)
amod(antennae-5, blue-4)
dobj(has-2, antennae-5)               <---- dobj means it is the original Stanford dependencies, right?

It should be noted that, for whatever improvements have been made to the dependency conversion from constituencies over the years, there aren't really any recent improvements to the SD, since everyone has moved on to UD. Still, you seem pretty determined to get SD and it is possible with CoreNLP, so this is how, I believe.

There are output options which will hopefully make the output file in whatever format you need... let us know if you get stuck with that.

Also, there is actually a converter which uses the Stanza constituency parser (again, it is much more accurate), but it ONLY does UD. If you want, we can connect it with the SD constituency -> dependency converter so that you can get SD, but that is the kind of change which will probably have to wait until after deadlines.

0 replies

Shasetty · 2023-06-17T14:57:30Z

Shasetty
Jun 17, 2023
Author

==========================(working sr parser witch basic dependency)
java -mx2048m -cp "G:\stanford-corenlp-4.5.4*" edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model G:\stanford-corenlp-4.5.4\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\stanford-corenlp-4.5.4\file.txt -annotators "tokenize, ssplit, pos, lemma,parse" -parse.originalDependencies true -output.dependencyType basic

=========================(working sr parser witch (enhanced plus plus dependencies)
java -mx2048m -cp "G:\stanford-corenlp-4.5.4*" edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model G:\stanford-corenlp-4.5.4\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\stanford-corenlp-4.5.4\file.txt -annotators "tokenize, ssplit, pos, lemma,parse" -parse.originalDependencies true

using 1st command line basic dependency, output was received.

Thank you sir, for the command line.

==============================================

0 replies

Shasetty · 2023-06-17T15:01:17Z

Shasetty
Jun 17, 2023
Author

After using SRparser, i found, in SR parser "dep: dependent" outcomes are more , compared to 3.9.2 basic SD or NNDEp of 4.5.2.

any possibility to reduce the "dep: dependent" outcomes in the text.

Sample text attached below:

This offering is being conducted on a firm commitment basis.

The underwriter is obligated to take and pay for all of the ordinary shares if any such shares are taken.

We have granted to the underwriter an option for a period of 45 days from the date of this prospectus to purchase up to 192,000 additional ordinary shares from us at the
initial public offering price, less the underwriting discounts and commissions, to cover over-allotments, if any.

We plan to use the net proceeds from this offering for (i) general working capital (30%); (ii) business and team expansion by recruiting more professional consultants across
different industries (30%); and (iii) specific industry-focused acquisition (40%).

With respect to the Company’s industry focused acquisitions, the Company plans to (i) purchase at least a majority interest in businesses it targets and not the assets of
such businesses (other than substantially all of the assets of such businesses) and (ii) purchase businesses that can support its consulting business, including but not limited
businesses from the technology and software development industry, the investor relations/public relations industry, and the digital marketing and branding industry.

We have not currently identified any targets for acquisition.

Pending use of proceeds from this offering, we intend to invest the proceeds in bank accounts, short-term, interest-bearing, investment-grade instruments, or hold as cash.

The foregoing represents our current intentions based upon our present plans and business conditions to use and allocate the net proceeds of this offering.

Our management, however, will have significant flexibility and discretion to apply the net proceeds of this offering.

If an unforeseen event occurs or business conditions change, we may use the proceeds of this offering differently than as described in this prospectus. See “Risk Factors.”

1 reply

Shasetty Jun 17, 2023
Author

its not 4.5.2 , it is 4.5.4

AngledLuffa · 2023-06-17T23:35:04Z

AngledLuffa
Jun 17, 2023
Maintainer

I'm not experiencing the same issue. For example, this is what I get for a couple of the sentences you gave. What are you getting?

NLP> We have not currently identified any targets for acquisition.

Sentence #1 (10 tokens):
We have not currently identified any targets for acquisition.

Constituency parse:
(ROOT
  (S
    (NP (PRP We))
    (VP (VBP have) (RB not)
      (ADVP (RB currently))
      (VP (VBN identified)
        (NP
          (NP (DT any) (NNS targets))
          (PP (IN for)
            (NP (NN acquisition))))))
    (. .)))

Dependency Parse (basic dependencies):
root(ROOT-0, identified-5)
nsubj(identified-5, We-1)
aux(identified-5, have-2)
neg(identified-5, not-3)
advmod(identified-5, currently-4)
det(targets-7, any-6)
dobj(identified-5, targets-7)
prep(targets-7, for-8)
pobj(for-8, acquisition-9)
punct(identified-5, .-10)

Sentence #1 (11 tokens):
This offering is being conducted on a firm commitment basis.


Constituency parse:
(ROOT
  (S
    (NP (DT This) (NN offering))
    (VP (VBZ is)
      (VP (VBG being)
        (VP (VBN conducted)
          (PP (IN on)
            (NP (DT a) (JJ firm) (NN commitment) (NN basis))))))
    (. .)))


Dependency Parse (basic dependencies):
root(ROOT-0, conducted-5)
det(offering-2, This-1)
nsubjpass(conducted-5, offering-2)
aux(conducted-5, is-3)
auxpass(conducted-5, being-4)
prep(conducted-5, on-6)
det(basis-10, a-7)
amod(basis-10, firm-8)
nn(basis-10, commitment-9)
pobj(on-6, basis-10)
punct(conducted-5, .-11)

0 replies

AngledLuffa · 2023-06-17T23:35:46Z

AngledLuffa
Jun 17, 2023
Maintainer

Typically a dep is for an unusual tree structure which hasn't been analyzed for dependencies in the converter before

0 replies

Shasetty · 2023-06-18T01:37:16Z

Shasetty
Jun 18, 2023
Author

sr parser 4.5.4 ; dep = 17

nndep 4.5.4 ; dep = 9 + (other modifiers are not identifed properly by the parser)

lexparser 4.5.4 ; dep = 15

lexparser 3.9.2 ; dep = 9

any possibility to fix the left out dependency in the converters, and provide the best parser.

0 replies

AngledLuffa · 2023-07-09T03:56:31Z

AngledLuffa
Jul 9, 2023
Maintainer

there is a section where it produces a dep:

  (NP (DT the)
    (NML (NN investor) (NNS relations))
    (, /)
    (NML
      (NML (JJ public) (NNS relations))
      (NN industry))
    (, ,)
    (CC and)
    (NP (DT the) (JJ digital)
      (NML (NN marketing)
        (CC and)
        (NN branding))
      (NN industry))))

dep(industry-7, relations-3)

this looks like a bad parse. the second NML should be an NP, inside a large NP, so something like

  (NP
    (NP
      (DT the)
      (NML (NN investor) (NNS relations))
      (, /)
      (NML (JJ public) (NNS relations))
      (NN industry))
    (, ,)
    (CC and)
    (NP (DT the) (JJ digital)
      (NML (NN marketing)
        (CC and)
        (NN branding))
      (NN industry))))

gonna skip. presumably with the parse tree the parser produced, one could make the dep there a conj, but it still wouldn't be right anyway

0 replies

AngledLuffa · 2023-07-09T04:01:03Z

AngledLuffa
Jul 9, 2023
Maintainer

We plan to use the net proceeds from this offering for (i) general working capital (30%); (ii) business and team expansion by recruiting more professional consultants across different industries (30%); and (iii) specific industry-focused acquisition (40%).

this section looks like we should be able to come up with a dependency for it:

  (PP (IN by)
    (S
      (VP (VBG recruiting)
        (NP (JJR more) (JJ professional) (NNS consultants))
        (PP (IN across)
          (NP (JJ different) (NNS industries)))
        (PRN (-LRB- -LRB-)
          (NP (CD 30) (NN %))
          (-RRB- -RRB-)))))))))))))

looking for similar annotations in EWT

If conditions are favorable (78 - 82 F)
appos(favorable, F)

pager (###-###-####), voicemail (###...)
appos(pager, number)
appos(voicemail, number)
appos(home, number)

there are some better comparables in Craft, but those don't have gold dependencies (do they?)

... by mixing Optiprep solution (60% iodixanol; sigma) ...
Some transcripts (5%) ...

is there any reason this isn't just appos, like in EWT?
but the appos says it refers to nouns, whereas here the head is the VP

The "appositional modifier" grammatical relation.  An appositional modifier of an NP is an NP that serves to modify the meaning of the NP.  It includes parenthesized examples, as well as defining abbreviations.

also from that sentence:

(NP
  (NP (JJ general) (NN working) (NN capital))
  (-LRB- -LRB-)
  (NP (CD 30) (NN %))
  (-RRB- -RRB-))

this is more clearly an appos, but it's harder to process because the -LRB- -RRB- were not put into a PRN like they should have been (parser error, in other words)

I will have to check with my PI if either of these are fixable. For the first, we could decide appos applies to VP as well, and for the second, it should be fairly easy to detect the () at the time of converting.

0 replies

AngledLuffa · 2023-07-09T04:03:12Z

AngledLuffa
Jul 9, 2023
Maintainer

Anyway, that's the summary for what I have so far. There's still going to be some dep. I'll put together an interim release and send it your way tomorrow. If you find more deps which don't fit these patterns, I can take a look at them.

It might also be worth taking a step back and asking why the use of SD instead of UD - the gold UD training data available makes it much easier to build a direct to dependencies parser, meaning the output dependencies will be much more accurate.

0 replies

Shasetty · 2023-07-09T04:53:09Z

Shasetty
Jul 9, 2023
Author

As the work what you have done , will reduce the dep's.
i will update you the deps, if i come across any.
why not using UD:

a) Considering the inputs of :
A Comprehensive Grammar of the English Language (1985)
SD is correct in identifying verbs, while UD (UDPIPE) is correct in identifying adjectives.

b) what is available in SD and what is merged UD (UD precision reduced)

vmod : acl
pcomp : acl or advcl
advcl : advcl

neg : advmod
quantmod: advmod
advmod : advmod

prep :case
possessive:case

nn :compound
number :compound

xcomp :xcomp
acomp :xcomp

c) The dependency relationship completely changes between SD and UDPIPE. (I will provide the text with examples later.)

0 replies

Shasetty · 2023-07-09T06:25:11Z

Shasetty
Jul 9, 2023
Author

                                considered word	sd	  udpipe

We finished early today. early jj rb
Their ambitions are alike. alike rb jj
the escaped prisoner. escaped vbd/root vbn/amod
the departed guests. departed . vbd/root vbn/amod
the faded curtains. faded vbd/root vbn/amod
The guests are departed. departed . jj verb

0 replies

AngledLuffa · 2023-07-09T10:29:11Z

AngledLuffa
Jul 9, 2023
Maintainer

https://nlp.stanford.edu/software/stanford-corenlp-4.5.4b.zip

0 replies

AngledLuffa · 2023-07-09T10:36:32Z

AngledLuffa
Jul 9, 2023
Maintainer

I guess what I'm saying is the direct-to-UD parser in Stanza might be better at everything, but you know your requirements better than I do.

0 replies

Shasetty · 2023-07-09T23:31:46Z

Shasetty
Jul 9, 2023
Author

Many Thanks for the release in short duration, Sir.

Surely i will go through UD parser in Stanza.

If i find any issues , i will update you sir, do the best, what you can.

0 replies

Shasetty · 2023-07-10T15:27:08Z

Shasetty
Jul 10, 2023
Author

I need to, have your opinion on one subject, can you share your personal mail id, please.

0 replies

AngledLuffa · 2023-07-10T17:17:32Z

AngledLuffa
Jul 10, 2023
Maintainer

It should be available on my profile, but if it's NLP related, you could consider posting it here

…

On Mon, Jul 10, 2023, 8:27 AM Shasetty ***@***.***> wrote: I need to, have your opinion on one subject, can you share your personal mail id, please. — Reply to this email directly, view it on GitHub <#1363 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWJKRF2MBSRJPYIEFP3XPQNNPANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Shasetty · 2023-07-12T00:26:24Z

Shasetty
Jul 12, 2023
Author

content for the opinion , is getting prepared, it will take some time, sir.

NLP related:

"This" from below text, in 1) obj and 2) nsubj. which one is correct.
using UD of 4.5.4 and SD of 4.5.4

From a personal point of view, I find this.
From a personal point of view, I find this a good solution to the problem.

UD4.5.4

case(point-4, From-1)
det(point-4, a-2)
amod(point-4, personal-3)
obl(find-9, point-4)
case(view-6, of-5)
nmod(point-4, view-6)
nsubj(find-9, I-8)
root(ROOT-0, find-9)
obj(find-9, this-10)

SD4.5.4

prep(find-9, From-1)
det(point-4, a-2)
amod(point-4, personal-3)
pobj(From-1, point-4)
prep(point-4, of-5)
pobj(of-5, view-6)
nsubj(find-9, I-8)
root(ROOT-0, find-9)
dobj(find-9, this-10)

UD4.5.4

case(point-4, From-1)
det(point-4, a-2)
amod(point-4, personal-3)
obl(find-9, point-4)
case(view-6, of-5)
nmod(point-4, view-6)
nsubj(find-9, I-8)
root(ROOT-0, find-9)
nsubj(solution-13, this-10)
det(solution-13, a-11)
amod(solution-13, good-12)
xcomp(find-9, solution-13)
case(problem-16, to-14)
det(problem-16, the-15)
nmod(solution-13, problem-16)

SD4.5.4

prep(find-9, From-1)
det(point-4, a-2)
amod(point-4, personal-3)
pobj(From-1, point-4)
prep(point-4, of-5)
pobj(of-5, view-6)
nsubj(find-9, I-8)
root(ROOT-0, find-9)
nsubj(solution-13, this-10)
det(solution-13, a-11)
amod(solution-13, good-12)
xcomp(find-9, solution-13)
prep(solution-13, to-14)
det(problem-16, the-15)
pobj(to-14, problem-16)

2 replies

AngledLuffa Jul 12, 2023
Maintainer

In cases like this, I would recommend checking with gold standard datasets to see how they handle similar cases.

For example, from UD (EWT), there is "who considered it ...", which is very similar to "I found this ..."

https://github.com/UniversalDependencies/UD_English-EWT

33      who     who     PRON    WP      PronType=Rel    34      nsubj   32:ref  _
34      considered      consider        VERB    VBD     Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   32      acl:relcl       32:acl:relcl    _
35      it      it      PRON    PRP     Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs  34      obj     34:obj  _
36      a       a       DET     DT      Definite=Ind|PronType=Art       37      det     37:det  _
37      violation       violation       NOUN    NN      Number=Sing     34      xcomp   34:xcomp        _
38      of      of      ADP     IN      _       42      case    42:case _
39      their   their   PRON    PRP$    Case=Gen|Number=Plur|Person=3|Poss=Yes|PronType=Prs     40      nmod:poss       40:nmod:poss    _
40-41   country's       _       _       _       _       _       _       _       _
40      country country NOUN    NN      Number=Sing     42      nmod:poss       42:nmod:poss    _
41      's      's      PART    POS     _       40      case    40:case _
42      sovereignty     sovereignty     NOUN    NN      Number=Sing     37      nmod    37:nmod:of      SpaceAfter=No
43      .       .       PUNCT   .       _       17      punct   17:punct        _

Here, the relation is obj(considered, it). The parse you sent has nsubj(this, solution), which is probably incorrect. Further evidence that it is supposed to be an xcomp from find is the third example here:

https://universaldependencies.org/en/dep/xcomp.html

I consider him a fool is basically the exact same relation as I find this a good solution

I will point out that I am only the 12th most important annotator for UD-EWT (out of 12) and therefore cannot be fully trusted when it comes to UD analysis. There is also the issue of, I can certainly ask my PI about these kinds of things, but I only speak to him for about an hour per week and I'm reluctant to use too much of that time exploring the corners of UD annotation.

When it's not clear what the right answer is, you can always post a question about it on the UD website instead:

https://github.com/UniversalDependencies/docs/issues/new

If there are cases where a reasonable constituency parse gets converted into the wrong dependency graph, you can certainly post about it here and we'll try to improve the converter over time.

Shasetty Jul 12, 2023
Author

dobj(find-9, this-10) : for text 1
nsubj(solution-13, this-10) : for text 2

Shasetty · 2023-07-12T00:53:20Z

Shasetty
Jul 12, 2023
Author

next query
text: In all fairness, she did try to phone the police.
which one is correct among the two.

1 reply

AngledLuffa Jul 12, 2023
Maintainer

Again I would ask, what is the standard annotation, and work from there. "Did X" is generally treated as an aux of X, as per the example in EWT of

He did once make an unforgivable error when ...
1       He      he      PRON    PRP     Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs  4       nsubj   4:nsubj _
2       did     do      AUX     VBD     Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   4       aux     4:aux   _
3       once    once    ADV     RB      NumType=Mult    4       advmod  4:advmod        _
4       make    make    VERB    VB      VerbForm=Inf    0       root    0:root  _
5       an      a       DET     DT      Definite=Ind|PronType=Art       7       det     7:det   _
6       unforgivable    unforgivable    ADJ     JJ      Degree=Pos      7       amod    7:amod  _
7       error   error   NOUN    NN      Number=Sing     4       obj     4:obj   _
8       when    when    ADV     WRB     PronType=Int    10      advmod  10:advmod       _

and the UD description of "aux" specifically covers "to do":

https://universaldependencies.org/u/dep/aux_.html

AngledLuffa · 2023-07-12T13:46:10Z

AngledLuffa
Jul 12, 2023
Maintainer

I will look into this, but being far from the expert on dependencies, I will have to take it to my PI next Tuesday. For reference, is there one produced by one version of the parser or the other? Also, when you compare SD to UD, they have different treatments of which words should be the head in certain cases. Even the root of an entire sentence can be different

…

On Tue, Jul 11, 2023 at 5:53 PM Shasetty ***@***.***> wrote: next query text: *In all fairness, she did try to phone the police.* which one is correct among the two. case(fairness-3,In-1) | case(fairness-3, In-1) det(fairness-3,all-2) | det(fairness-3, all-2) obl(try-7,fairness-3) | obl(did-6, fairness-3) nsubj(try-7,she-5) | nsubj(did-6, she-5) aux(try-7,did-6) | root(ROOT-0, did-6) root(ROOT-0,try-7) | ccomp(did-6, try-7) mark(phone-9,to-8) | case(phone-9, to-8) xcomp(try-7,phone-9) | obl(try-7, phone-9) det(police-11,the-10) | det(police-11, the-10) obj(phone-9,police-11) | obj(try-7, police-11) — Reply to this email directly, view it on GitHub <#1363 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWK3JLGFNIQVYDYGZZLXPXYQXANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Shasetty · 2023-07-12T15:05:27Z

Shasetty
Jul 12, 2023
Author

Please take the output for both sentences in UD 4.5.4 and check them.
["this" in 1) obj and 2) nsubj]: Which one is correct?

From a personal point of view, I find this.
From a personal point of view, I find this a good solution to the problem.

Further for text: In all fairness, she did try to phone the police.
The pasted content is from two different universal parsers.

As you and your team are considered one of the best, I would like to know the team's opinion.

1 reply

AngledLuffa Jul 12, 2023
Maintainer

Again, if you come up with situations where the converter is messing up a good constituency parse, we can work on that. If the constituency parse starts off bad, there's only so much we can do - certainly we won't want to overcompensate in the converter in such a way that a pattern which is the correct constituency parse for a different sentence is now worsened.

AngledLuffa · 2023-07-12T23:46:32Z

AngledLuffa
Jul 12, 2023
Maintainer

Yes, that's exactly what I mean - I do not believe the nsubj in text 2 fits the pattern as shown in the sentences I cited in EWT or the UD docs

…

On Wed, Jul 12, 2023 at 4:33 PM Shasetty ***@***.***> wrote: dobj(find-9, this-10) : for text 1 nsubj(solution-13, this-10) : for text 2 — Reply to this email directly, view it on GitHub <#1363 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWM65FZONQBL57WTC4LXP4X6LANCNFSM6AAAAAAZBRRUPI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

englishPCFG.ser.gz change in files #1363

Shasetty Jun 10, 2023

Hi Sir/Madam. Shankar from Bangalore. I found, that the file englishPCFG.ser.gz, in version 4.5.3 & 4.5.4 was saved on 3 Nov 2020. Whereas other version from 4.2.2 till 4.5.2 has file englishPCFG.ser.gz, saved on 14 May 2021.

Replies: 50 comments · 7 replies

AngledLuffa Jun 10, 2023 Maintainer

AngledLuffa Jun 10, 2023 Maintainer

Shasetty Jun 11, 2023 Author

AngledLuffa Jun 12, 2023 Maintainer

Shasetty Jun 12, 2023 Author

AngledLuffa Jun 14, 2023 Maintainer

Shasetty Jun 14, 2023 Author

AngledLuffa Jun 14, 2023 Maintainer

Shasetty Jun 15, 2023 Author

java -mx2048m -cp "G:\shiftreduce-corenlp-4.4.0/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -parse.model G:\shiftreduce-corenlp-4.4.0\stanford-corenlp-4.4.0-models-english\edu\stanford\nlp\models\srparser\englishSR.ser.gz -textFile G:\shiftreduce-corenlp-4.4.0\file.txt

Shasetty Jun 15, 2023 Author

working codes of other parsers

AngledLuffa Jun 15, 2023 Maintainer

Shasetty Jun 17, 2023 Author

Shasetty Jun 17, 2023 Author

any possibility to reduce the "dep: dependent" outcomes in the text.

Shasetty Jun 17, 2023 Author

AngledLuffa Jun 17, 2023 Maintainer

AngledLuffa Jun 17, 2023 Maintainer

Shasetty Jun 18, 2023 Author

AngledLuffa Jul 9, 2023 Maintainer

AngledLuffa Jul 9, 2023 Maintainer

AngledLuffa Jul 9, 2023 Maintainer

Shasetty Jul 9, 2023 Author

Shasetty Jul 9, 2023 Author

AngledLuffa Jul 9, 2023 Maintainer

AngledLuffa Jul 9, 2023 Maintainer

Shasetty Jul 9, 2023 Author

Shasetty Jul 10, 2023 Author

AngledLuffa Jul 10, 2023 Maintainer

Shasetty Jul 12, 2023 Author

content for the opinion , is getting prepared, it will take some time, sir.

UD4.5.4

SD4.5.4

prep(find-9, From-1) det(point-4, a-2) amod(point-4, personal-3) pobj(From-1, point-4) prep(point-4, of-5) pobj(of-5, view-6) nsubj(find-9, I-8) root(ROOT-0, find-9) dobj(find-9, this-10)

UD4.5.4

SD4.5.4

AngledLuffa Jul 12, 2023 Maintainer

Shasetty Jul 12, 2023 Author

Shasetty Jul 12, 2023 Author

AngledLuffa Jul 12, 2023 Maintainer

AngledLuffa Jul 12, 2023 Maintainer

Shasetty Jul 12, 2023 Author

AngledLuffa Jul 12, 2023 Maintainer

AngledLuffa Jul 12, 2023 Maintainer

Shasetty
Jun 10, 2023

Hi Sir/Madam.
Shankar from Bangalore.
I found, that the file englishPCFG.ser.gz, in version 4.5.3 & 4.5.4 was saved on 3 Nov 2020.
Whereas other version from 4.2.2 till 4.5.2 has file englishPCFG.ser.gz, saved on 14 May 2021.

Replies: 50 comments 7 replies

AngledLuffa
Jun 10, 2023
Maintainer

AngledLuffa
Jun 10, 2023
Maintainer

Shasetty Jun 11, 2023
Author

AngledLuffa
Jun 12, 2023
Maintainer

Shasetty
Jun 12, 2023
Author

AngledLuffa
Jun 14, 2023
Maintainer

Shasetty
Jun 14, 2023
Author

AngledLuffa
Jun 14, 2023
Maintainer

Shasetty
Jun 15, 2023
Author

Shasetty
Jun 15, 2023
Author

AngledLuffa
Jun 15, 2023
Maintainer

Shasetty
Jun 17, 2023
Author

Shasetty
Jun 17, 2023
Author

Shasetty Jun 17, 2023
Author

AngledLuffa
Jun 17, 2023
Maintainer

AngledLuffa
Jun 17, 2023
Maintainer

Shasetty
Jun 18, 2023
Author

AngledLuffa
Jul 9, 2023
Maintainer

AngledLuffa
Jul 9, 2023
Maintainer

AngledLuffa
Jul 9, 2023
Maintainer

Shasetty
Jul 9, 2023
Author

Shasetty
Jul 9, 2023
Author

AngledLuffa
Jul 9, 2023
Maintainer

AngledLuffa
Jul 9, 2023
Maintainer

Shasetty
Jul 9, 2023
Author

Shasetty
Jul 10, 2023
Author

AngledLuffa
Jul 10, 2023
Maintainer

Shasetty
Jul 12, 2023
Author

prep(find-9, From-1)
det(point-4, a-2)
amod(point-4, personal-3)
pobj(From-1, point-4)
prep(point-4, of-5)
pobj(of-5, view-6)
nsubj(find-9, I-8)
root(ROOT-0, find-9)
dobj(find-9, this-10)

AngledLuffa Jul 12, 2023
Maintainer

Shasetty Jul 12, 2023
Author

Shasetty
Jul 12, 2023
Author

AngledLuffa Jul 12, 2023
Maintainer

AngledLuffa
Jul 12, 2023
Maintainer

Shasetty
Jul 12, 2023
Author

AngledLuffa Jul 12, 2023
Maintainer

AngledLuffa
Jul 12, 2023
Maintainer