This project involves profiling, understanding, and analyzing the Yelp dataset. The dataset contains information about businesses, users, reviews, check-ins, and more. The analysis is divided into two parts:
Part 1: Yelp Dataset Profiling and Understanding
In this part, various aspects of the dataset are explored and analyzed using SQL queries. The analysis includes:
Total number of records for each table Total distinct records by primary key or foreign key for each table Checking for null values in the Users table Finding the minimum, maximum, and average values for specific columns Listing cities with the most reviews Distribution of star ratings for businesses in specified cities Finding the top users based on their total number of reviews Exploring the correlation between the number of reviews and the number of fans Identifying if there are more reviews with the word "love" or "hate" in them
Part 2: Inferences and Analysis
This part involves drawing inferences and performing analysis based on specific questions and scenarios. For example: Comparing businesses with 2-3 stars to those with 4-5 stars in a particular city and category Grouping businesses based on whether they are open or closed and identifying differences between them