This tool is an extension for the Python Framework luigi which helps to build reproducable and complex data pipelines for batch jobs. Visit our docs to learn more!
This is how an end-to-end luisy
pipeline may look like:
import luisy
import pandas as pd
@luisy.raw
@luisy.csv_output(delimiter=',')
class InputFile(luisy.ExternalTask):
label = luisy.Parameter()
def get_file_name(self):
return f"file_{self.label}"
@luisy.interim
@luisy.requires(InputFile)
class ProcessedFile(luisy.Task):
def run(self):
df = self.input().read()
# Some more preprocessings
# ...
# Write to disk
self.write(df)
@luisy.final
class MergedFile(luisy.ConcatenationTask):
def requires(self):
for label in ['a', 'b', 'c', 'd']:
yield ProcessedFile(label=label)
Stable Branch: main
Minimum python version: 3.8
Install luisy with
pip install luisy
To run all unittests that are inside the tests directory use the following command:
pytest
Please have a look at our contribution guide.
Name | License | Type |
---|---|---|
numpy | BSD-3-Clause License | Dependency |
pandas | BSD 3-Clause License | Dependency |
networkx | BSD-3-Clause License | Dependency |
luigi | Apache License 2.0 | Dependency |
distlib | Python license | Dependency |
matplotlib | Other | Dependency |
azure-storage-blob | MIT License | Dependency |
tables | BSD license | Dependency |
pipdeptree | MIT License | Dependency |
requirements-parser | Apache License 2.0 | Dependency |
pyarrow | Apache License 2.0 | Dependency |
spark | Apache License 2.0 | Dependency |
Name | License | Type |
---|---|---|
sphinx | BSD-2-Clause | Dependency |
sphinx_rtd_theme | MIT License | Dependency |
flake8 | MIT License | Dependency |
pytest | MIT License | Dependency |
pytest-flake8 | BSD License | Dependency |
pytest-cov | MIT License | Dependency |
pip-tools | BSD 3-Clause License | Dependency |