Use MRT hourly data to run Kmeans analysis based on entrance and exit number of people for each station
You can also read the article at Medium for reference.
Continue on the project MRT_Cleaning_Visualizing, here the data will be reshaped and divied into entrance and exit for each station, and run K-means clustering in order to explore deeper relation between stations in terms of people flow.
Below is the process flow:
Steps
- Refine the data using functions in MRT_cleaning_visualizing.ipynb, which will be used as raw data for K-means,
where the data for each stations has been transfered from 7*21 table to a list and normalized. - Run K-means analysis with those hourly number of people as variables by MRT_K-means_Analysis.ipynb, which will generate:
- k_pack_IN/OUT csv file for raw data of entrance and exit separately
- cluster_group_IN/OUT csv file showing which stations belong to which cluster group
- line graph and heat map grouped by cluster
- df_cluster.csv showing the entrance and exit group for each station
- MRT_K-means_Analysis.ipynb visualize the result in terms of clusters by heat map and line graph:
Entrance Cluster 0 | Entrance Cluster 1 | Entrance Cluster 2 | Entrance Cluster 3 |
---|---|---|---|
Exit Cluster 0 | Exit Cluster 1 | Exit Cluster 2 | Exit Cluster 3 |
---|---|---|---|
Line graph for entance: Line graph for exit:
-
Visualize the result in terms of single station via MRT_K-means_StationDashboard.ipynb, which generates dashboard for all stations under the folder created by MRT_K-means_Analysis.ipynb:
-
Geo-visualize the result using QGIS and define the clusters by their patters into:
- Peak in the morning (cluster 0)
- Peak in both morning and afternoon, and weekends (cluster 1)
- Peak in the afternoon (cluster 2)
- Peak in both morning and afternoon (cluster 3)
Entrance Cluster 0 | Entrance Cluster 1 |
---|---|
Peak in the morning | Peak in both morning and afternoon, and weekends |
Entrance Cluster 2 | Entrance Cluster 3 |
---|---|
Peak in the afternoon | Peak in both morning and afternoon |
- For each station, merge its entrance and exit cluster in to final category. There are 16 possible combinations but only 9 generated in practice:
Below shows the distrubution of stations in group A, B and D:
It is significant to notice the stations regard as residential group are distributed around Taipei City, while work place and leisure tpyes are located at the center of the city.