Skip to content

Flask API with Selenium Quickstart. Any contribution is welcome

License

Notifications You must be signed in to change notification settings

Ismola/selenium-scraper-quickstarter

Repository files navigation

Selenium Scraper Starter

This repository provides a foundation for building robust and scalable web scrapers using Selenium and Flask. It emphasizes best practices including environment management, configuration with Docker, and a well-structured project layout.

Key Features

  • Selenium Automation: Efficiently interact with dynamic webpages using Selenium's browser automation capabilities.
  • Flask Backend: Create a RESTful API with Flask to manage scraper execution, authorization, and logging.
  • Bearer Authentication: Implement a secure mechanism for API access using bearer tokens.
  • Environment Management: Facilitate deployment across different environments (production, staging) using environment variables.
  • Docker Configuration: Streamline containerization for a consistent and portable development experience.
  • Logging System: Track scraper activities and errors for debugging and monitoring.

Local Setup

Prerequisites

Before diving in, ensure you have the following tools installed:

Create a Virtual Environment

We use a module named virtualenv which is a tool to create isolated Python environments. Virtualenv creates a folder that contains all the necessary executables to use the packages that a Python project would need.

python3 -m venv <whatever_virtual_environment_name>

Activate virtual environment

source <whatever_virtual_environment_name>/bin/activate   # for Unix/Linux
.\<whatever_virtual_environment_name>\Scripts\activate    # for Windows

Install project libraries

pip install -r .\requirements.txt

Run main file

python3 .\main.py

Now, the server is accessible at http://localhost3000

First Call

First Call

Auth Call

Make your firsts changes

  1. The first thing is to add your .env file. You can add a invented bearer token to get started

  2. Then configure the base url in the utils/config.py file

  3. In order to work on your project, you must add an endpoint to main.py.

  4. Next, create a controller, and add the different web actions on the controller. It is recommended to do actions with few steps, to be able to modularize your code, and not repeat code in the future.

Project Structure

├─── main.py                   # Entry point for the Flask application
├─── .vscode                   # Configuration for Visual Studio Code (optional)
├─── actions                   # Contains scraper actions (logic for data extraction)
├─── controller                # Functions handling API requests
├─── temp_downloads            # Temporary files created during scraping
└─── utils                     # Reusable helper functions

Bibliography

About

Flask API with Selenium Quickstart. Any contribution is welcome

Topics

Resources

License

Stars

Watchers

Forks