This data pipeline was developed for Hong Kong investor in the first place.
Since then, Hong Kong's open financial data remains lacking. APIs are unstable. Data from Mainland China has not become more available.
The status of Hong Kong has also been in decline. For good reason, U.S. blocked Hong Kong IPs from BLS. More retail investors in Hong Kong are moving to U.S. market.
And we look to OpenBB Terminal for liberalization of data in the global market.
Operation Pluto is a pipeline set-up. It plumbs financial and economic data. Focused markets are Hong Kong, U.S. and China.
This data pipeline is organized in Luigi framework with Python.
Currently connected data sources :
- Census and Statistics Department
- The Hong Kong Association of Banks
- Hong Kong Government Bond Programme
- Hong Kong Monetary Authority
- Hang Seng Indexes Company
- Rating and Valuation Department
- ?
- Crawl websites, back-fill past data, and construct file directories. All done as code.
- One table in data source corresponds to one target file.
- Pipeline task is stateful. Overwrite source file the least possible.
Have Python 3.5 installed and clone this repository :
# Clone this repository
$ git clone https://github.com/hydra-lab/operation-pluto
Install Python dependencies :
# Installing with Conda may not work
$ pip install -r requirements.txt
Set up Luigi configuration file :
# Rename luigi.cfg.sample to luigi.cfg
$ mv luigi.cfg.sample luigi.cfg
Configure proxies in luigi.cfg
if you're behind any :
[proxies]
https = https://username:password@hostname:port/
Test the installation. New data should be extracted and parsed into folder test/data
:
$ python -m luigi --module main RunMock --local-scheduler
$ ls test/data
High-level job orchestration is done in main.py
. e.g. RunAll()
is the wrapper class to initialize whole data
directory and trigger all processing tasks. In production, tasks should be run on Luigi server. Because Luigi daemon will not run on Windows, simply run :
# Run Luigi server on http://localhost:8082
$ luigid
# Run task on Luigi server
$ python -m luigi --module main RunAll
Schedule pipeline to run periodically in Task Scheduler or cron. Set up run.sh
on Windows :
# Script on Windows
start luigid
python -m luigi --module main RunAll
cmd "/c taskkill /IM "luigid.exe" /T /F"
This project is licensed under GNU Affero General Public License, Version 3.0. See LICENSE for full license text.