Skip to content

ormu5/apache-beam-python-runners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This repo demonstrates different Apache Beam runners with Python application code; emphasis: PortableRunner with Spark.

See main.py for supported runners and code related to each.

Getting Started Locally

Dependencies

  • Docker
  • Python 3.9

Setup

  • Install Python 3.9 (preferably via virtual environment)
  • Install deps: pip install -r requirements.txt
  • Activate env (e.g., source venv/bin/activate)

Supporting Services

From ./dev-utils: docker-compose up

Run the Pipeline

# You can run the script file directly.
python main.py

# To run passing command line arguments.
python main.py --input-text="someMultiPart camelCased Words"

# To run the tests.
python -m unittest -v

Observe

Spark executioners: http://localhost:8081/

Spark jobs: http://localhost:4040/jobs/ (this endpoint is hosted by the Beam Job server and is only available while jobs are running)

App log output: stdout of worker nodes

Good to Know

  • The Beam job server and SDK harness (workers) must share a file volume for staging.
  • There is never direct communication between the job serve and SDK harness. Comms go: Beam job server -> Spark master -> Spark workers -> Beam SDK harness (workers).
  • Artifact endpoint only communicates which artifacts are needed. Processes then look to above referenced volume for retrieving those artifacts.

License

This software is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE for details.

Attribution and Contributing

Beam application logic in this repo is based on https://github.com/apache/beam-starter-python.

Also gleaned bits from:

Feel free to open a PR against this repo.

Known Issues

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages