Skip to content

Google Data Analytics Professional Certificate, Google Data Analytics Capstone: Complete a Case Study, November 2024. This project contains my solution for the Google Data Analytics Capstone: Complete a Case Study - TrackA - Case Study 1: How does a bike-share navigate speedy success?

Notifications You must be signed in to change notification settings

carlacotas/google-data-analytics-capstone-case-study-trackA-cyclistic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

78 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Google Data Analytics Course - Capstone Project (November 2024)

Case study: How does a bike-share navigate speedy success? πŸš΄πŸš²πŸ‘©β€πŸ’»πŸ“ˆπŸ“Š

This project contains a solution for the Capstone of Google Data Analytics Professional Certificate Program from Google

Google Data Analytics Course πŸ“ˆπŸ“Š

  • Professional training designed by Google πŸŸ’πŸ”΅πŸ”΄πŸŸ‘
  • Prepare for a career in Data Analytics
  • 8 Course series, including Google Data Analytics Capstone: Complete a Case Study

This project is my proposed solution on the Cyclistic bike-share analysis case study!


Introduction

In this case study, I will perform real-world tasks as a junior data analyst for a fictional company, Cyclistic, working in the marketing analytics team. In order to answer the business questions, I will follow the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.


Background and Business Question

Cyclistic

Cyclistic is a bike-share company in Chicago. In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

This bike-share program features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use the bikes to commute to work each day.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, the director of marketing believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, she believes there is a solid opportunity to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.


Characters and teams

  • Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.

  • Cyclistic executive team: The notoriously detail-oriented executive team.

  • Lily Moreno: The director of marketing and the manager of my team. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. She believes the company’s future success depends on maximizing the number of annual memberships.

  • Carla Cotas: A junior data analyst working in the marketing analytics team. My team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve the recommendations, so they must be backed up with compelling data insights and professional data visualizations.


Scenario

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members.

In order to do that, however, my team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Therefore, Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.


Ask

Analysis Questions

Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

The first question to answer was assigned to me: How do annual members and casual riders use Cyclistic bikes differently?


Prepare

Folder Structure

The local directory, folders, sub-folders and file-naming conventions follow the structure below:

|--20241113_CyclisticBikeShare
  |-- 1.OriginalData
  |-- 2.PreparedData
  |-- 3.UploadedData
    |-- 20141202
  |-- 4.Analysis
    |-- DataErrors
      |-- AumaticallyExcludedResults
      |-- ManuallyExcludedResults
    |-- RStudio Code
    |-- Figures
  |-- 5.Insights
    |-- MyInsights
    |-- SharedInsights
  |-- 6.Final
    |-- Report
    |-- Presentation

Data Source

Cyclistic’s historical trip data to analyze and identify trends is available here.

This study analyzes and identifies trends from the previous 12 months: November 2023 to October 2024 (β€˜202211-divvy-tripdata.csv’ β†’ β€˜202410-divvy-tripdata.csv’).

The data has been made available by Motivate International Inc. under this license. This is public data that can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Data Preparation

⚠️ Note: I tried to complete this project using the data from previous 12 months but I've been limited to manage renaming the original data files in Posit’s RStudio and I could only renaiming 6 files - when renaiming 7 files, Posit’s RStudio crashes. So I decided to complete this project focusing the first previous 6 months: November 2023 to April 2024 (β€˜202311-divvy-tripdata.csv’ β†’ β€˜202404-divvy-tripdata.csv’) because Posit’s RStudio does not crashes due to Memory Usage. However, when merging data in the Process, Posit’s RStudio crashes! So, following the tip from the Case Study Roadmap for the Data Process, I use the β€˜Divvy_Trips_2019_Q4.csv’ and β€˜Divvy_Trips_2020_Q1.csv’ datasets (October 2019 to March 2020).

The Cyclistic’s historical trip data is available to download in zip files (β€˜Divvy_Trips_2019_Q2.csv’ and β€˜Divvy_Trips_2020_Q1.csv’). After downloaded and stored in a folder (1.OriginalData), the zip files were unzipped locally giving the original .CSV data files (β€˜Divvy_Trips_2019_Q2.csv’ and β€˜Divvy_Trips_2020_Q1.csv’). This study uses Posit’s RStudio - free version for the analysis because it is a powerful tool for performing statistical analyses and data visualizations.

The original data files, naming convention of Divvy_Trips_YYYY_(Q1/Q4), were loaded into Posit’s RStudio and renamed for simplicity and improve readability.

CyclisticTripData_2019_Q4 <- read.csv("Divvy_Trips_2019_Q4.csv")
CyclisticTripData_2020_Q1 <- read.csv("Divvy_Trips_2020_Q1.csv")

The variables Divvy_Trips_YYYY_(Q1/Q4) were assigned and were added automatically to the environment:

image

Key findings:

  • the data is organized in CSV (comma-separated values) format
  • there is an inconsistency in the column numbers: 2020, Q1 has 13 columns - 2019, Q4 only has 12 columns
  • the number of observations/rows is different every Quarter

A check summary data was performed for exploring the inconsistency in the column numbers that the original data files have and to ensure the same number of columns and the same column names before moving to the next step and merge them in only one data set.

colnames(CyclisticTripData_2019_Q4)
colnames(CyclisticTripData_2020_Q1)

CyclisticTripData_dfs = list(CyclisticTripData_2019_Q4, CyclisticTripData_2020_Q1)

for (CyclisticTripData_df in CyclisticTripData_dfs) {
  glimpse(CyclisticTripData_df)
}

for (CyclisticTripData_df in CyclisticTripData_dfs) {
  str(CyclisticTripData_df)
}

image

  • CyclisticTripData_2019_Q4 The summary data includes trip id, started and ended time, bike id, trip duration, start and end station - name and id, user type, gender and birthyear.

image

  • CyclisticTripData_2020_Q1 The summary data includes ride id, rideable type, started and ended time, start and end station - name, id, latitude and longitude, and member type.

image

And, in a very exploratory analysis, there are some datatype inconsistencies.

  • CyclisticTripData_2019_Q4

image

  • CyclisticTripData_2020_Q1

image

The data can be trusted, it is published in a trustworthiness source and it is aligned with the question that was assigned to me. However, all ride ids are unique and due to data privacy prohibiting using rider's personally identifiable information, it will not be possible to determine if riders have purchased multiple single passes. Moreover, the problems with data identified lead to the need to process the data and clean it for analysis.


Process

In this step, the two datasets are combined into one dataset, clean it and manipulate it.

Data cleaning

Before combining the tow datasets (β€˜CyclisticTripData_2019_Q4’ and β€˜CyclisticTripData_2020_Q1’) into one dataset (β€˜CyclisticTripData’), the inconsistency in column names in each dataframe needs to be addressed by renaming the columns to ensure uniformity. The column names are renaming considering the most recent naming convention - CyclisticTripData_2020_Q1.

(CyclisticTripData_2019_Q4 <- rename(CyclisticTripData_2019_Q4
                   ,ride_id = trip_id
                   ,started_at = start_time  
                   ,ended_at = end_time
                   ,rideable_type = bikeid
                   ,start_station_id = from_station_id
                   ,start_station_name = from_station_name
                   ,end_station_id = to_station_id
                   ,end_station_name = to_station_name
                   ,member_casual = usertype))

Now that the column names are the same, the datatype for the columns ride_id and rideable_type needs to be uniform for combining the two dataframes.

CyclisticTripData_2019_Q4 <- mutate(CyclisticTripData_2019_Q4, ride_id = as.character(ride_id)
                                    ,rideable_type = as.character(rideable_type))

The data attributes now are uniform! The two dataframes can be combined into only one dataframe for analyzing 6 months of data.

CyclisticTripData <- bind_rows(CyclisticTripData_2019_Q4, CyclisticTripData_2020_Q1)

image

Then the large dataframe can be verified to check very quickly uniformities and to get a better data sensitivity.

colnames(CyclisticTripData)
glimpse(CyclisticTripData)
head(CyclisticTripData)
tail(CyclisticTripData)

From looking at the output information, the columns start_lat, start_lng, end_lat and end_lng can be removed because the data were collected only at β€˜CyclisticTripData_2020_Q1’. Also, the columns birthyear and gender can be removed because the data were collected only at β€˜CyclisticTripData_2019_Q4’.

CyclisticTripData <- CyclisticTripData[,!names(CyclisticTripData) %in% c("birthyear"
                                                                         ,"gender"
                                                                         ,"start_lat"
                                                                         ,"start_lng"
                                                                         ,"end_lat"
                                                                         ,"end_lng")]

Typing View(CyclisticTripData) on the console returns how the CyclisticTripData dataframe looks like.

image

Now, the large dataframe is ready to check that data is clean, free of errors and check if modifying data is needed.

Note: The large dataframe is exported to continue working in another R script due to Posit’s RStudio Memory Usage limit on the free version!

write.csv(CyclisticTripData,"20241119_CyclisticTripData.csv", row.names = FALSE)

Data manipulation

Note: The large dataframe was loaded into Posit’s RStudio (new project and R script) to avoid that RStudio crashes due to Memory Usage.

Next, a closer look at data is taken to check for duplicates, null values, and inconsistency on values that needs to be cleaned.

# find 'ride_id' DUPLICATES 
CyclisticTripData_duplicates <- CyclisticTripData[duplicated(CyclisticTripData$ride_id), ]
duplicate_counts <- table(CyclisticTripData[duplicated(CyclisticTripData$ride_id), ])

# find and count MISSING values
which(is.na(CyclisticTripData))     
sum(is.na(CyclisticTripData))       
colSums(is.na(CyclisticTripData))   
CyclisticTripData <- CyclisticTripData[,!names(CyclisticTripData) %in% c("tripduration")]
CyclisticTripData_missing <- CyclisticTripData[rowSums(is.na(CyclisticTripData)) > 0,]
write.csv(CyclisticTripData_missing,"20241120_CyclisticTripData_MissingValues.csv", row.names = FALSE)
CyclisticTripData <- CyclisticTripData[rowSums(is.na(CyclisticTripData)) == 0, ]

# ensuring consistency on data 'member_casual' column
unique(CyclisticTripData$member_casual)
CyclisticTripData$member_casual_copy = CyclisticTripData$member_casual
CyclisticTripData <- CyclisticTripData %>%
  mutate(member_casual = recode(member_casual, 'Subscriber' = 'member', 'Customer' = 'casual' ))
CyclisticTripData <- subset( CyclisticTripData, select = -member_casual_copy )

# check for consistency on number of characters across columns
summarise_if(CyclisticTripData, is.character, funs(max(nchar(.), na.rm=T)))
summarise_if(CyclisticTripData, is.character, funs(min(nchar(.), na.rm=T)))

Key findings:

  • There are no ride_id duplicates
  • There are 426888 missing values (426887 in 'tripduration', 1 in 'end_station_id')
  • The 426887 missing values in 'tripduration' are due to this column data collected only at β€˜CyclisticTripData_2020_Q1’
  • The column 'tripduration' can be removed. Now the data has 1 missing value image
  • There are "Subscriber","Customer","member" and "casual" at member_casual column. This fact is due to the different member type values between β€˜CyclisticTripData_2019_Q4’ and β€˜CyclisticTripData_2020_Q1’. The information that we have is that: Customers are identified as casual riders ("causal") and Cyclistic members ("members"). So, to ensure consistency on data: "Subscriber" is considered a Cyclistic member and "Customer" is the same as casual rider.
  • There are no consistency on values of column rideable_type due to the same fact presented in the previous point, values change between β€˜CyclisticTripData_2019_Q4’ and β€˜CyclisticTripData_2020_Q1’. However, with the information available and without further information we can't address this data inconsistency. So, for the purposes of this study, it was decided to remove this column.
    CyclisticTripData <- CyclisticTripData[,!names(CyclisticTripData) %in% c("rideable_type")]
    
  • There are consistency on characters length on columns started_at, ended_at and member_casual image image

Now, the data is ready to the transform and calculation steps to know the ride length and day of week.

New columns ride_length and day_of_week are created to calculate (1) the length of each ride by subtracting the column started_at from the column ended_at and (2) the day of the week that each ride started, respectivelly. Both columns started_at and ended_at have the start and end time in the format YYYY-MM-DD hh:mm:ss.

CyclisticTripData$ride_length <- as_hms(difftime(CyclisticTripData$ended_at, CyclisticTripData$started_at))
CyclisticTripData$day_of_week <- weekdays(as.Date(CyclisticTripData$started_at))

Now, a closer look at data in the new columns is taken to check for consistency and meaning on values.

# check the values on column 'day_of_week' ensuring consistency
unique(CyclisticTripData$day_of_week)

# check for 'ride_length' negative or zero and remove them
as_hms(min(CyclisticTripData$ride_length, na.rm = TRUE))
as_hms(max(CyclisticTripData$ride_length, na.rm = TRUE))
CyclisticTripData_ridelength_zeronegative <- CyclisticTripData[CyclisticTripData$ride_length <= 00:00:00,]
write.csv(CyclisticTripData_ridelength_zeronegative,"20241126_CyclisticTripData_ridelength_InconsistencyValues.csv", row.names = FALSE)
CyclisticTripData <- CyclisticTripData[CyclisticTripData$ride_length > 00:00:00, ]

Key findings:

  • There are consistency on the the day of the week that each ride started, with unique values corresponding to the names of the days of the week image
  • There are at least one negative value on ride length. And, at least one value that is higher than one day (circa 109 days) and less than the total number of days corresponding to this analsysis image
  • There are 222 observations with negative ride length or equal to 00:00:00
  • At the end, the data ready for analysis has 1130718 observations

Note: For the purposes of this study and to learn objectives, the data is already ready for the Analyze step; however, the original data β€˜CyclisticTripData_2020_Q1’ includes columns with latitude and longitude for the start and end stations. An additional step on data manipulation is to calculate the ride length if possible with the start and end stations information (latitude and longitude) from the first quarter of 2020 to complete the 'CyclisticTripData' dataframe.

The large dataframe and the original data from the first quarter of 2020 were loaded into Posit’s RStudio (new project and R script). And, a closer look at data is taken to check for consistency on the stations start and end names between the two dataframes.

## renaming files for simplicity and improve readability
CyclisticTripData <- read.csv("20241126_CyclisticTripData.csv")
CyclisticTripData_2020_Q1 <- read.csv("Divvy_Trips_2020_Q1.csv")

## get the start and end - latitude and longitude - for each station 
CyclisticTripData_2020_Q1 <- CyclisticTripData_2020_Q1[,!names(CyclisticTripData_2020_Q1) 
                                                       %in% c("rideable_type"
                                                              ,"member_casual"
                                                              ,"started_at"
                                                              ,"ended_at")]

CyclisticTripData <- CyclisticTripData[,!names(CyclisticTripData) 
                                                       %in% c("started_at"
                                                              ,"ended_at"
                                                              ,"member_casual"
                                                              ,"ride_length"
                                                              ,"day_of_week")]


unique(CyclisticTripData_2020_Q1$start_station_name)
unique(CyclisticTripData_2020_Q1$end_station_name)
CyclisticTripData_2020_Q1_start <- CyclisticTripData_2020_Q1 %>% distinct(start_station_name, .keep_all=TRUE)
CyclisticTripData_2020_Q1_end <- CyclisticTripData_2020_Q1 %>% distinct(end_station_name, .keep_all=TRUE)

unique(CyclisticTripData$start_station_name)
unique(CyclisticTripData$end_station_name)

Key findings:

  • There are no consistency on the start and end station names on both dataframes: fisrt quarter of 2020 - unique start station names: 607, unique end station names: 603 large dataframe - unique start station names: 613, unique end station names: 611
  • Without further information about all the stations latitude and longitude we cannot guarantee consistency on ride length calculations for all observations
  • At the end, the data ready for analysis has 1130718 observations and 10 features

Analyze

After the data is prepared, processed, and stored appropriately, it's ready for analyze it. The main objective in this step is to find trends or relationships in the data. Also, we can discover some surprises in the data. And, find how will these insights help answer the question assigned to me. This step includes conducting descriptive analysis and identifying trends and relationships.

Following the Case Study Roadmap for the Data Analyze, I started this step by looking the following questions:

  • How should you organize your data to perform analysis on it?
  • Has your data been properly formatted?
  • What surprises did you discover in the data?
glimpse (CyclisticTripData)
str (CyclisticTripData)
summary(CyclisticTripData)

The data includes ride id, started and ended time, start and end station - name and id, member type, ride length and day of week.

image

image

image

To run the analysis we must have ride length on the proper format for descriptive analysis. Also, an additional format transform will be done, to guarantee that start and end station - id has the correct format.

CyclisticTripData <- mutate(CyclisticTripData, start_station_id = as.character(start_station_id)
                                    ,end_station_id = as.character(end_station_id)
                                    ,ride_length = as.numeric(hms(ride_length)))

image

image

image

The data is now properly formatted!

Descriptive analysis

Important metrics such as mean and maximum for the ride length are calculated for each member type.

CyclisticTripData_RideLengthMetrics <- CyclisticTripData %>% 
  group_by(member_casual) %>%
  summarise(mean_ride_length = round(mean(ride_length), 0)
            , median_ride_length = median(ride_length)
            , max_ride_length = max(ride_length))

CyclisticTripData_RideLengthMetrics <- mutate(CyclisticTripData_RideLengthMetrics
                                           , mean_ride_length = hms(mean_ride_length)
                                           , median_ride_length = hms(median_ride_length)
                                           , max_ride_length = hms(max_ride_length))

image image

Next, the mode of day of week is also calculated.

CyclisticTripData_ModeDayOfWeek <- CyclisticTripData %>% 
  group_by(member_casual) %>%
  summarise(mode_day_of_week = Mode(day_of_week))

image

The mean ride length by day of week can be also calculate.

CyclisticTripData_Mean_UserDayOfWeek <- CyclisticTripData %>% 
  group_by(member_casual, day_of_week) %>%
  summarise(mean_ride_length = round(mean(ride_length), 0), .groups = 'drop') %>%
  mutate(mean_ride_length = hms(mean_ride_length))

image image

And, the total number of rides for each member type by day of week can be calculated.

CyclisticTripData_Rides_UserDayOfWeek <- CyclisticTripData %>% 
  group_by(member_casual, day_of_week) %>%
  summarise(number_rides = n(), .groups = 'drop') %>%
  mutate(percentage_number_rides = round(number_rides/sum(number_rides) * 100,2))

image

Also, the total number of rides for each member type can be calculated.

CyclisticTripData_TotalRides <- CyclisticTripData %>% 
  group_by(member_casual) %>% 
  summarise(ride_count = n()) %>%
  mutate(percentage_rides = round(ride_count/sum(ride_count)*100,2))

image

Key findings:

  • Only circa of 14% total rides are casual members
  • There are more rides of casual members on weekends than during week days. At the opposite, there are more rides of members during week days than weekends
  • Rides of members during week days range between circa 12 to 16% of the total number of rides. While for the casual members is always less than circa 2% of the total number of rides
  • Casual members ride more frequently on Sunday. And, members ride more frequently on Tuesday
  • During weekends, members ride for longer lengths (higher mean ride length). Monday is the day with lower mean ride length for casual members

Important data insights were already obtained and can be translated visually to present data more clearly.


Share

In this step, after the analysis is performed and some insights are obtained, relevant visualizations are created to share findings.

  • % number of rides by member type
slices <- CyclisticTripData_TotalRides$percentage_rides 
lbls <- c("Casual", "Member")
pct <- CyclisticTripData_TotalRides$percentage_rides
lbls <- paste(lbls, pct)
lbls <- paste(lbls,"%",sep="")
pie(slices,labels = lbls, col=rainbow(length(lbls)),
    main="Percentage of Total Rides")

20241205_PercentageTotalRides

  • number of rides for member type by day_of_week
ggplot(CyclisticTripData_Rides_UserDayOfWeek,aes(x = factor(day_of_week,
                                                 levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
                                                 , y = number_rides, fill = member_casual)) +
  labs(title = "Total number of rides by day of week",
       x = "Day of week",
       y = "Total number of rides") +
  theme(plot.title = element_text(face="bold")) +
  theme(axis.title.x = element_text(face="bold")) +
  theme(axis.title.y = element_text(face="bold")) +
  labs(fill = "Member type") +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values=c("#00ffff", 
                             "red"))

20241205_TotalNumberOfRidesPerdayOfWeek

  • mean ride length for member type by day of week
ggplot(CyclisticTripData_Mean_UserDayOfWeek,aes(x = factor(day_of_week,
                                                            levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
                                                 , y = mean_ride_length, fill = member_casual)) +
  labs(title = "Mean ride length by day of week",
       x = "Day of week",
       y = "Mean ride length [seconds]") +
  theme(plot.title = element_text(face="bold")) +
  theme(axis.title.x = element_text(face="bold")) +
  theme(axis.title.y = element_text(face="bold")) +
  labs(fill = "Member type") +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values=c("#00ffff", 
                             "red"))

20241205_MeanRideLengthPerDayOfWeek

Now that visualizations are created, deliverables can be prepared.


Act

In this final step, based on my analysis and findings, the deliverables Morena asked me to create, are prepared and presented.

The question assigned to me: How do annual members and casual riders use Cyclistic bikes differently?

From this study:

  • Casual members use Cyclistic bikes for longer ride length than annual members
  • Casual members ride more during weekends and more frequently on Sunday

Based on my analysis, I would recommend the following to increase converting casual riders into annual members:

  • It can be created a new offer with benefits during weekends and long rides
  • Design marketing campaigns and fun activities to create engagement, strategic activities near bike stations can reach out more potential members from Saturday to Tuesday
  • It can be created a special discount for annual members who refer Cyclistic to friends (that are casual members) that become annual members

Also, further analysis can be done:

  • additional data about the Cyclistic stations location can be used to calculate the ride distances
  • the data already have features that allow to obtain the day and month, and further analysis can be conducted to explore findings per month, day and hour
  • additional data about the bike type can be used to get relevant information about members preference

A document, PowerPoint presentation, was prepared to communicate the data insights to Lily Moreno (Director of Marketing).


What's in this project? πŸ‘©β€πŸ’»

README.md: That's this file, where you can describe your project and how you built it.

20241113_DataPreparation_CyclisticBikeShareCapstone.R: That's R script with the code for the steps Data Preparation and Process - Data Cleaning.

20241120_DataManipulation_CyclisticBikeShareCapstone: That's R script with the code for the steps Process - Data Manipulation.

20241128_ExtraDataManipulation_CyclisticBikeShareCapstone.R: That's R script with the code for the steps Process - Data Manipulation with additional step that intended to calculate ride distance.

20241202_DataAnalyze_CyclisticBikeShareCapstone.R: That's R script with the code for the steps Analyze.

20241205_DataShare_CyclisticBikeShareCapstone.R: That's R script with the code for the steps Share.

20241206_CyclisticCaseStudy_Question1_CarlaCotas.pdf: That's powerpoint presentation in pdf format.


Resources

Background and Business Question was inspired by the business case description provided on Google Data Analytics Capstone: Complete a Case Study

About

Google Data Analytics Professional Certificate, Google Data Analytics Capstone: Complete a Case Study, November 2024. This project contains my solution for the Google Data Analytics Capstone: Complete a Case Study - TrackA - Case Study 1: How does a bike-share navigate speedy success?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages