-
-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to get how many texts summarized by the summarizer? #188
Comments
Hello, sorry but I can't see a pattern there. How do you determine which sentences you want to return for given summatised sentence? Once it is sentence before the second time after. Also, what are the numbers? I thought it's count of sentences in context but it is always 2 even for one sentence. |
The results would be
I am under the impression that LSA algorithm would only show the most distinct sentences and hide those that is already represented by other sentences. Is this correct? |
Thank you for more info.
Yes. you could say it like this I think. LSA works with concept of (very abstract) topics and tries to get representative sentences for them. I believe if you say "close to 1st and 2nd sentence" you don't mean close as in sentence position but in vector space, right? You would like to know for every sentence in result summary the list of removed sentences it represents in the original text and how many of them. I am afraid there is no easy way how to get this info from the LSA summarizer. Summarizers are a black box and one can tweak them slightly sometimes. But this would require to create completely new one that also picks sentences as you described. But I don't really know how would I approach it. If this is just one time thing maybe it is easier to use ChatGPT for this 😃 |
Ah, yes, from the vector space
Yes, I wasn't trying to tweak the LSA itself, I was thinking that maybe looking at the powered_sigma or v_matrix, one could make a relation with it (I believe this is the similarity matrix?). Something that looks like this. |
Oh, yes. If you are even willing to try some clustering algorithm and make your own modifications to LSA it is definitely doable. You know all the vectors so you can cluster them together. You even now initial cluster leaders (summarized sentences). It sounds like you know what you are doing so should be fine :) |
Yes, about this. I am not sure how to read this part. How can I get the vector matrices? |
It is a bit more complicated. LSA gives you 2 matrices and a vector. I use only one of the matrices but their combination always have some meaning. You can check more in the documentation and link to original article from Steinberger and Jezek. Here is a relevant result Line 45 in 7fd4970
Here the computation of the sentences for the topics Lines 119 to 120 in 7fd4970
Also keep in mind I implemented the library years ago and giving you advises from my poor memory and what is see in the code now. Can't dedicate more time to study the LSA details a to advise you more unfortunately. |
Suppose I have this kind of text
Running the LSA algorithm with 3 sentences count gives me
Is it possible to get count of summarized text?
I assume it would look like this
Everything checked out. -> [Check this out., Everything checked out.] -> 2
Not so much is checked. -> [Not so much is checked., I am not sure what is happening.] -> 2
The dog is burnt. -> [The dog is burnt.] -> 1
Can I get this info from the SVD matrix?
Edit: wrong count for number for dog sentence
The text was updated successfully, but these errors were encountered: