Bulldozer

Bulldozer is a script designed to automate the process of downloading, organizing, analyzing, and creating torrents for podcasts. It's highly customizable, as pretty much everything you might be interested in changing is defined in the configuration file.

Features

Download podcast episodes using RSS feeds
Check for duplicate episodes using tracker API
Organize and analyze downloaded files
Generate reports based on the downloaded content
Data fetching from the Podchaser and Podcastindex API
Data fetching from Podnews
Automatic RSS censoring for matching premium sources
Optional local database with metadata for improved flexibility
Option to split active podcasts on current year (database required)
Partial download of feed using --match-titles
Torrent file creation with piece size calculation

Requirements

Python 3.12.0+
Required Python packages (listed in requirements.txt)
mktorrent
podcast-dl 10.3.1+

Installation

Clone the repository:

git clone git@github.com:lewler/bulldozer.git
cd bulldozer

Install the required Python packages:
```
pip install -r requirements.txt
```

Install additional dependencies:

sudo apt-get install libwebp-dev libavif-dev

Create your own config file, and add the things you need to override:
```
touch config.yaml
```
If you want to use the Podchaser API you will need a token, which is free up to 25k points per month.

Configuration

Edit the config.yaml file to set up your preferences and API keys. The configuration file includes pretty much all settings that are needed to customize the behavior of the script. The settings most users need to change are at the top of the configuration file. The file has comments, and it's hopefully easy enough to understand what everything does.

Note that you do not need to copy the entire file, and you do not need to add values that you don't need to change. This approach means less work when new things are added to config.default.yaml.

Upgrading

Upgrading should be fairly simple, but if you're jumping versions it might get messy. In that case, do a fresh install and copy your settings over. To upgrade do the following:

Update the codebase
```
git pull
```

Make sure requirements are up-to-date

 pip install -r requirements.txt --upgrade

Run the config checker to see if your config is outdated
```
python bulldozer --check-config
```
The config checker will let you know if there are settings in your config that are outdated (ie, the don't exist in the default config).

Usage

Command Line Interface

Run the script using the command line interface:

python bulldozer <input>

<input>: RSS feed URL, directory path, local RSS file path, or name to dupecheck.

Note that if your on Linux, you should be able to run the script in this way:

chmod +x bulldozer
./bulldozer <input>

Options

--censor-rss: Make sure the RSS feed is censored.
--report-only: Only check the files.
--download-only: Only downloads the files.
--refresh: Don't use the data in the database.
--check-files: Only check the files.
--dupecheck: Search the API for .
--make-torrent: Only create a torrent file.
--check-config: Check if user config is valid.
--log-level: Set the logging level (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL).
--search-term: Use as search term instead of podcast name.
--name: Use as the podcast name.
--match-titles: Will only keep episodes matching in the feed.

Project Structure

bulldozer: Main script
classes/: Contains various classes used in the project.
- apis/: Contains classes to interact with various apis.
  - podcastindex.py: Interacts with the Podcastindex API
  - podchaser.py: Interacts with the Podchaser API
- scrapers/: Contains classes to scrape websites.
  - podnews.py: Scrapes data from Podnews.
- cache.py: Handles the caching.
- data_formatter.py: Methods for transforming data.
- database.py: Handles the database logic.
- dupe_checker.py: Checks for duplicates.
- file_analyzer.py: Analyzes downloaded files.
- file_organizer.py: Organizes downloaded files.
- podcast_image.py: Handles podcast image processing.
- podcast_metadata.py: Manages podcast metadata.
- podcast.py: Represents a podcast and its metadata.
- report_template.py: Templates for generating reports.
- report.py: Generates reports based on downloaded content.
- rss.py: Handles RSS feed operations.
- torrent_creator.py: Creates torrent files.
- utils.py: Utility functions.
logs/: Contains log files.
config.example.yaml: Example configuration file.
requirements.txt: List of required Python packages.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes.

Acknowledgements

Jinja2 for templating.
PyYAML for YAML parsing.
Pillow for image processing.
yaspin for terminal spinners.
mutagen for audio metadata handling.
titlecase for title casing.
Podchaser API for additional metadata.
Podcastindex API for additional metadata.
Podnews for additional metadata.
TinyDB for database support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bulldozer

Features

Requirements

Installation

Configuration

Upgrading

Usage

Command Line Interface

Options

Project Structure

License

Contributing

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bulldozer

Features

Requirements

Installation

Configuration

Upgrading

Usage

Command Line Interface

Options

Project Structure

License

Contributing

Acknowledgements