Inspired by personal curiosity and a 2023 Hackathon challenge, I conducted a project that won the ‘Most Polished’ category. The project investigates the impact of large language models like ChatGPT on entry-level tech roles. I demonstrated skills in data cleaning, data wrangling, data analysis, and modeling using Python, APIs, Polars, and Hvplot. The project includes an exploratory dataset analysis, a study on whether ChatGPT has replaced juniors and interns, and an attempt at modeling estimated compensation using different regression models.
My future work includes tuning hyperparameters, particularly for the best scoring model (SVR “rbf”), analyzing feature importances, discussing and explaining modeling results, and further documentation of the code to enhance its readability and maintainability.
This project provided me with valuable experience in data analysis and modeling, with a special focus on learning data analysis in Polars and using Hvplot for visualization. The baseline model achieved an MSE of 0.661 and an R2-score of 0.38, while experimenting with different regression models improved the mean MSE to 0.6 and the mean R2 score to 0.44. The project highlighted the importance of model experimentation and the potential for further improvement through hyperparameter tuning and feature analysis. Additionally, the project helped me improve code DRY-ness, reinforcing the importance of writing efficient, reusable code in data science projects.