Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code #51

Merged
merged 6 commits into from
Jul 6, 2020
Merged

Refactor code #51

merged 6 commits into from
Jul 6, 2020

Conversation

satyamtg
Copy link
Contributor

@satyamtg satyamtg commented Jul 3, 2020

This basically does the following refactors -

  • Change some variable names - many variable names are changed to sound a bit clearer. For example, path has been changed to relative_path, where applicable, book_list_list has been changed to book_lists, head is changed to head_course_xblock, top is changed to course_tabs (as this basically holds links in the nav menu at the top). There are of course several other small variable name changes (mostly in scraper.py and the xblock_extractor objects)
  • Use ZimInfo to build ZIM - We use zimscraperlib function to build ZIMs in a similar way we do for TED
  • Use pathlib instead of os.path - I've tried to implement all paths using pathlib
  • Refactor xblocks_extractractors - The various objects shared a lot of code. So, I created a base class. There's a bit more we can do here. But I didn't want to combine everything into this PR as it's already quite a lot of changes. Will open tickets for this after the merge.
  • Use proper logging from zimscraperlib - We now use logger from zimscraperlib to have better logs
  • Removed unused imports - There were many unused imports. I've tried to remove all of them. I've also made the import ordering better.
  • Use a more sensible folder structure - The folders inside the course directories had a folder in them with the same name which held the index.html . I've removed this and this makes it a bit simpler to understand the folder structure.

Please note that utils.py is not fully refactored in this as adding other features/ fixing issues would involve changing that file largely. Also. currently the language is set to "eng" for the ZIM. I'll change that while fixing #49

This shall close the following - #45, #46, #43

The argument structure is now modified as follows -

usage: openedx2zim [-h] --course-url COURSE_URL --email EMAIL [--password PASSWORD] --name NAME
                   [--title TITLE] [--description DESCRIPTION] [--creator CREATOR]
                   [--publisher PUBLISHER] [--tags TAGS] [--convert-in-webm]
                   [--ignore-missing-xblocks] [--lang LANG] [--add-wiki] [--add-forum]
                   [--output OUTPUT_DIR] [--tmp-dir TMP_DIR] [--zim-file FNAME]
                   [--no-fulltext-index] [--no-zim] [--keep] [--debug] [--version]

Scraper to create ZIM files MOOCs on openedx instances

optional arguments:
  -h, --help            show this help message and exit
  --course-url COURSE_URL
                        URL of the course you wnat to scrape
  --email EMAIL         Your registered e-mail ID on the platform. Used for authentication
  --password PASSWORD   The password to your registered account on the platform. If you don't
                        provide one here, you'll be asked for it later
  --name NAME           ZIM name. Used as identifier and filename (date will be appended)
  --title TITLE         Custom title for your ZIM. Based on MOOC otherwise.
  --description DESCRIPTION
                        Custom description for your ZIM. Based on MOOC otherwise.
  --creator CREATOR     Name of content creator
  --publisher PUBLISHER
                        Custom publisher name (ZIM metadata)
  --tags TAGS           List of comma-separated Tags for the ZIM file. category:openedx, openedx,
                        and _videos:yes (if present) added automatically
  --convert-in-webm     Re-encode videos to WebM
  --ignore-missing-xblocks
                        Ignore unsupported content (xblock)
  --lang LANG           Default language of the interface and the ZIM content (ISO-639-1 codes)
  --add-wiki            Add wiki (if available) to the ZIM
  --add-forum           Add forum (if available) to the ZIM
  --output OUTPUT_DIR   Output folder for ZIM file
  --tmp-dir TMP_DIR     Path to create temp folder in. Used for building ZIM file. Receives all
                        data
  --zim-file FNAME      ZIM file name (based on --name if not provided)
  --no-fulltext-index   Don't index the scraped content in the ZIM
  --no-zim              Don't produce a ZIM file, create build folder only.
  --keep                Don't remove build folder on start (for debug/devel)
  --debug               Enable verbose output
  --version             Display scraper version and exit

Here's a test ZIM that I created -
test_2020-07.zip (Please ignore the random strings that I put for title and description)

I've matched this with what the original scraper without any changes did and they seem identical. There is a scope of improvements but we're not looking at the frontend part now. Maybe sometime later.

@satyamtg satyamtg self-assigned this Jul 3, 2020
satyamtg added 2 commits July 3, 2020 17:05
- Refactored annex.py, connection.py to use newer changes
- Refactored some templates to use newer variables
- Black formatting for everything
@satyamtg satyamtg marked this pull request as ready for review July 4, 2020 06:52
@satyamtg satyamtg requested a review from rgaudin July 4, 2020 08:32

def download(self, c):
if self.mooc.forum_thread != None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked this value? None evaluates to False but most False value are different from None

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@rgaudin rgaudin merged commit e5067f3 into master Jul 6, 2020
@rgaudin rgaudin deleted the refactor_arguments branch July 6, 2020 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better ZIM creation Better paths handling Use logger from zimscraperlib
2 participants