LEAD / Lime Project

This a little part of a Data Architecture student project, during the Jedha Lead Formation (Data Engineering / MLOps...)

Collective project in 3 days with 3 collaborators : Marie Agrapart, Pascal Toutain and David Gauthier

We were a team of data engineers at Lime. We had to use the open data from the City of Paris API regarding the coordinates of velib stations and the availability of bikes.

The goal was to tell Lime employees where they could place Lime electric scooters, on a map, by analyzing "real time" public bike availability deficits.

We had to set up a proof of concept for a data pipeline with technologies like Kafka, Airflow, Streamlit...

The schematics are visible in the docs folder. Slide Presentation too.

Here, ou principal architecture :

DESCRIPTION

The producer.py script requests the open data API every minutes and sends about 1453 informations of bike availability (1 per station) with a Kafka Confluent Connector to a Postgres database (AWS RDS) wich store its, by overwriting the old information (we think of the planet!).

Another parralelize query, every 24 hours, with a Airflow trigger, allows us to retrieve the GPS coordinates of the velib stations.

The join is made directly in the Streamlit application for the users.

Time lag was to 1-2 seconds, between the send (of the producer script) and our Streamlit app !...

A third query, every hour, is automated with Airflow to take the availability of the bikes with Airflow and, this time, not delete them. The information was sent to an AWS S3 for future analysis, or Machine Learning modeling...

First, we did a proof of concept of this Data Pipeline locally...

Then, we tried to imagine a more complex architecture with two Machine Learning models: a Deep Learning RNN (for time series problems) to predict the availability of bikes in the map, and a Reinforcement Learning model to reduce the travel time of small "refill" trucks in the city.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Streamlit		Streamlit
__pycache__		__pycache__
docs		docs
.gitignore		.gitignore
README.md		README.md
ccloud_lib.py		ccloud_lib.py
producer1.py		producer1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEAD / Lime Project

DESCRIPTION

About

Releases

Packages

Languages

Jesshuan/Project-Lime_Jedha_Lead-Data-Engineering-

Folders and files

Latest commit

History

Repository files navigation

LEAD / Lime Project

DESCRIPTION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages