This project contains a solution for the LLM Classification Finetuning competition on Kaggle - kaggle notebook
Main objective: Predict which responses users will prefer in a head-to-head battle between chatbots powered by large language models(LLMs).
Steps of the data science process:
- Understand the original data
- Data Cleaning
- Data Exploration
- Feature Engineering
- Model Selection and Training
- Model Evaluation
- Submission File
Data
-
train.csv id - A unique identifier for the row. model_a/b - The identity of model_a/b. Included in train.csv but not test.csv. prompt - The prompt that was given as an input (to both models). response_a/b - The response from model_a/b to the given prompt. winner_model_a/b/tie - Binary columns marking the judge's selection. The ground truth target column.
-
test.csv id prompt response_a/b
-
sample_submission.csv A submission file in the correct format. id winner_model_a/b/tie - This is what is predicted from the test set. add Codeadd Markdown
README.md
: That's this file, where you can describe your project and how you built it.
llm-classification-finetuning.ipynb
: That's python notebook with the code submitted to the competition.