Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Fix sub-definitions not being listed #68

Closed
wants to merge 4 commits into from
Closed

Fix sub-definitions not being listed #68

wants to merge 4 commits into from

Conversation

imlutr
Copy link
Contributor

@imlutr imlutr commented Sep 9, 2020

Previously, a word that had a layered definition, such as:

Screenshot 2020-09-09 at 19 18 18

would return the following definition:

['grapple (countable and uncountable, plural grapples)',
 'A tool with claws or hooks which is used to catch or hold something.',
 'A close hand-to-hand struggle.',
 '(uncountable) The act of grappling.']

So the sub-definitions of A tool with claws or hooks which is used to catch or hold something. weren't listed.

This PR makes it so the returned definition becomes:

['grapple (countable and uncountable, plural grapples)',
 ['A tool with claws or hooks which is used to catch or hold something.',
  '(nautical) A device consisting of iron claws, attached to the end of a rope, used for grasping and holding an enemy ship prior to boarding; a grappling iron.',
  '(nautical) A grapnel (“type of anchor”).'],
 'A close hand-to-hand struggle.',
 '(uncountable) The act of grappling.']

thus fixing #57.

However, this structure (using arrays to depict a layered definition, where the first element represents the top-level definition) may not be the most ideal one. Not sure.

This causes two of the tests to fail (grapple and house), as they both have such sub-definitions that were previously ignored.

They were deleted in the parse_examples() function, which would remove quotations, sub-definitions accidentally being treated as such.
Sub-definitions were previously appended to list as:

["Top definition A", "Sub definition ASub definition B", "Top definition B"]

Now, they are appended to the "definitions" list as:

[["Top definition A", "Sub definition A", "Sub definition B"], "Top definition B"]

However, I am not sure if this is the best structure. You may change it if you find a better way.
@suyashb95
Copy link
Owner

Thanks for this! I'll take a look. Can you fix the failing tests in the meanwhile?

@suyashb95 suyashb95 self-requested a review September 9, 2020 17:41
Previously, if the top definition and its sub-definitions weren't separated by '\n', the code would fail.
@imlutr
Copy link
Contributor Author

imlutr commented Sep 11, 2020

Done. Everything should be fine now. However, I'll test some more words on my own, to make sure I didn't miss any edge cases.

@reeseovine
Copy link

reeseovine commented Sep 21, 2020

Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁

@suyashb95
Copy link
Owner

@Luca1152 @katacarbix while this change works, I'm not sure the definitions list should contain different data types. For your use cases, is it a problem if the sub-definitions are included in the list as top level definitions?

@katacarbix what are the modifications you've had to make? Was wondering if the parser could be enhanced to include them

Thanks so much for this! I just tested it out with my script that uses WiktionaryParser and (with slight modification) it works great. Hope it gets merged soon! 😁

@reeseovine
Copy link

reeseovine commented Sep 23, 2020

I added a section to loop through sub-definitions and output them indented. I also put back the colons that I was removing before for formatting.

Old:

word = parser.fetch(query)
entries = []
for section in word:
    for defn in section['definitions']:
        for entry in defn['text']:
            if entry[:len(query)] != query:
                entries.append('('+defn['partOfSpeech']+') ' + entry.rstrip(':'))

New:

word = parser.fetch(query)
entries = []
for section in word:
    for defn in section['definitions']:
        for entry in defn['text']:
            if type(entry) is list:
                entries.append('('+defn['partOfSpeech'])+') ' + entry[0])
                for subentry in entry[1:]:
                    entries.append('    ' + subentry)
            elif entry[:len(query)] != query:
                entries.append('('+defn['partOfSpeech'])+') '  + entry)

@pragma-
Copy link

pragma- commented Jul 11, 2021

@suyash458 This PR fixes a very serious bug. Is this ever going to get merged and is pip ever going to get a new release?

@imlutr imlutr closed this by deleting the head repository Dec 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants