-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use common outgoing connection Session creation code for scraping #817
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Evan-Leon
added a commit
that referenced
this pull request
Oct 18, 2024
* Add scrape-source and scrape-collection commands to manage.py Both commands take an id number and an email to send results to. Both commands run "in process" for test/debug unless "--queue" is given, in which case the request is queued as a task. Make sure CTRL/C causes immediate termination, and gives a full backtrace. * Feature add static collections (#810) * Add static column to collections, add front end work to reflect * Add chip to header and alert to modify collection * Add tooltip to static chip * Use mc-providers v2.2.0 caching argument to allow queries to bypass cache for testing infrastructure already in place, uncomments a line of code! * Implement multiple task queues (#813) * Implement multiple queues for background tasks * fixes * fixes * fix get_pending_tasks docstring paste-o * cleanup mcweb/backend/sources/tasks.py: restore docstring mcweb/backend/util/tasks.py: define SYSTEM_SLOW as 'system-slow' --------- Authored-by: Phil Budne <[email protected]> * Use common outgoing connection Session creation code for scraping (#817) * Update scraping code to use SSL/headers used in other Media Cloud projects. * mcweb/backend/sources/models.py: add SCRAPE_HTTP_SECONDS, never pass newline to add_line! * backend/search/views.py: removed commented out requests.Session creation --------- Authored-by: Phil Budne <[email protected]> * Change static to managed for collections * update runtime.txt to python 3.10.15 due to security issues * Update utils.py - remove prefix wildcards Prefix wildcards have a huge performance cost. Removing here as a precursor. * Update utils.py - scheme-safe url-search-string * Update version and release notes for new relase * Feature add contributor roles (#824) * Start roles management command * Make management command to make groups and assign users * Add front end role contributor * Add role permissions to directory * Fix permissions on upload sources, test contributor role * Fix save button on a managed collection * Edit create-groups command name, write docs for management command * Remove console log --------- Co-authored-by: Phil Budne <[email protected]> Co-authored-by: Phil Budne <[email protected]> Co-authored-by: Paige Gulley <[email protected]>
Evan-Leon
added a commit
that referenced
this pull request
Oct 18, 2024
* Add scrape-source and scrape-collection commands to manage.py Both commands take an id number and an email to send results to. Both commands run "in process" for test/debug unless "--queue" is given, in which case the request is queued as a task. Make sure CTRL/C causes immediate termination, and gives a full backtrace. * Feature add static collections (#810) * Add static column to collections, add front end work to reflect * Add chip to header and alert to modify collection * Add tooltip to static chip * Use mc-providers v2.2.0 caching argument to allow queries to bypass cache for testing infrastructure already in place, uncomments a line of code! * Implement multiple task queues (#813) * Implement multiple queues for background tasks * fixes * fixes * fix get_pending_tasks docstring paste-o * cleanup mcweb/backend/sources/tasks.py: restore docstring mcweb/backend/util/tasks.py: define SYSTEM_SLOW as 'system-slow' --------- Authored-by: Phil Budne <[email protected]> * Use common outgoing connection Session creation code for scraping (#817) * Update scraping code to use SSL/headers used in other Media Cloud projects. * mcweb/backend/sources/models.py: add SCRAPE_HTTP_SECONDS, never pass newline to add_line! * backend/search/views.py: removed commented out requests.Session creation --------- Authored-by: Phil Budne <[email protected]> * Change static to managed for collections * update runtime.txt to python 3.10.15 due to security issues * Update utils.py - remove prefix wildcards Prefix wildcards have a huge performance cost. Removing here as a precursor. * Update utils.py - scheme-safe url-search-string * Update version and release notes for new relase * Feature add contributor roles (#824) * Start roles management command * Make management command to make groups and assign users * Add front end role contributor * Add role permissions to directory * Fix permissions on upload sources, test contributor role * Fix save button on a managed collection * Edit create-groups command name, write docs for management command * Remove console log --------- Co-authored-by: Phil Budne <[email protected]> Co-authored-by: Phil Budne <[email protected]> Co-authored-by: Paige Gulley <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
mcmetadata now contains code from story-indexer to make HTTP connections to source sites, use it, and updated sitemap tools in site (re)scrape, AND use connect/read timeouts in scraping to (hopefully) avoid hanging for issue #791