You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A post on SO brings me back to my old idea to add PMI to textstat_collocation(). It is less good as the lambda but super fast to run. PMI will be computed by
PMI = P("a b c") / (P("a") * P("b") * P("c"))
Where P("x") is a probability of "x" in the corpus.
The text was updated successfully, but these errors were encountered:
Yes we have that in https://github.com/kbenoit/quanteda.collocationsdev, where the idea is to use this for comparison in our paper (still under development) about collocations. We had this in but took it out while we prove that the log-linear approach is superior. Once we work that out (soon I hope!) we should definitely consider returning some of the other measures.
This is all standard stuff, e.g. https://nlp.stanford.edu/fsnlp/promo/colloc.pdf. However this does not for sizes > 2, since in the PMI example the marginal probabilities (in the denominator) need to account for P(a, b), P(b, c), P(a, c) as well. That's our angle with lambda.
Suggest we keep this separate as is and kick ourselves (myself) to flesh out the collocations paper, where we can sort all this out (with Jouni's input of course, as planned).
A post on SO brings me back to my old idea to add PMI to
textstat_collocation()
. It is less good as the lambda but super fast to run. PMI will be computed byWhere P("x") is a probability of "x" in the corpus.
The text was updated successfully, but these errors were encountered: