Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging recomm_sys #72

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Merging recomm_sys #72

wants to merge 23 commits into from

Conversation

akkadhim
Copy link

@akkadhim akkadhim commented Dec 25, 2024

Adding the recommendation system experiments. Please ignore any changes outside the (examples/recomm_system) directory.

@BooBSD
Copy link

BooBSD commented Dec 26, 2024

@akkadhim Could you please export your noisy datasets to a CSV file for testing in other languages?

@akkadhim
Copy link
Author

@akkadhim Could you please export your noisy datasets to a CSV file for testing in other languages?

Sure, below are different datasets for different noise ratios.

noisy_dataset_0.05.csv
noisy_dataset_0.005.csv
noisy_dataset_0.02.csv
noisy_dataset_0.2.csv
noisy_dataset_0.01.csv
noisy_dataset_0.1.csv

@BooBSD
Copy link

BooBSD commented Dec 27, 2024

@akkadhim Thank you!

@BooBSD
Copy link

BooBSD commented Dec 27, 2024

@akkadhim Is it correct that, after one-hot booleanization, your input data consists of 10709 bits? This includes 1350 unique product_ids + 317 categories + 9042 user_ids.

@akkadhim
Copy link
Author

@akkadhim Is it correct that, after one-hot booleanization, your input data consists of 10709 bits? This includes 1350 unique product_ids + 317 categories + 9042 user_ids.
After expanding the original dataset and adding the noise, the unique features will be:
Users: 1193
Items: 1350
Categories: 211
I used the one_hot_encoding for the TM classifier, and at that step, the dataset split to train and test portions.

@BooBSD
Copy link

BooBSD commented Dec 27, 2024

@akkadhim
Got it. However, the columns category and user_id contain lists of categories and users, joined by the "|" and "," characters. Why weren’t they split into individual unique categories and user IDs? Could you confirm if your method of booleanization is correct?

@BooBSD
Copy link

BooBSD commented Dec 27, 2024

@akkadhim
I tested both booleanization methods (yours and mine) and obtained approximately the same validation accuracy.
I split your dataset such that the first 80% is used for training, and the last 20% for validation.

My best validation accuracy:

  • noisy_dataset_0.005.csv: 99.69%
  • noisy_dataset_0.2.csv: 84.77%

Here is the proof:

#1  Accuracy: 77.66%  Best: 77.66%  Training: 0.789s  Testing: 0.093s
#2  Accuracy: 84.90%  Best: 84.90%  Training: 0.133s  Testing: 0.001s
#3  Accuracy: 88.97%  Best: 88.97%  Training: 0.116s  Testing: 0.001s
#4  Accuracy: 90.06%  Best: 90.06%  Training: 0.109s  Testing: 0.001s
#5  Accuracy: 92.38%  Best: 92.38%  Training: 0.099s  Testing: 0.001s
#6  Accuracy: 96.99%  Best: 96.99%  Training: 0.099s  Testing: 0.001s
#7  Accuracy: 93.75%  Best: 96.99%  Training: 0.096s  Testing: 0.001s
#8  Accuracy: 97.61%  Best: 97.61%  Training: 0.093s  Testing: 0.001s
#9  Accuracy: 98.33%  Best: 98.33%  Training: 0.090s  Testing: 0.001s
#10  Accuracy: 98.70%  Best: 98.70%  Training: 0.088s  Testing: 0.001s
#11  Accuracy: 98.53%  Best: 98.70%  Training: 0.089s  Testing: 0.001s
#12  Accuracy: 98.29%  Best: 98.70%  Training: 0.086s  Testing: 0.001s
#13  Accuracy: 77.15%  Best: 98.70%  Training: 0.090s  Testing: 0.001s
#14  Accuracy: 98.57%  Best: 98.70%  Training: 0.086s  Testing: 0.001s
#15  Accuracy: 98.22%  Best: 98.70%  Training: 0.087s  Testing: 0.001s
#16  Accuracy: 99.15%  Best: 99.15%  Training: 0.087s  Testing: 0.001s
#17  Accuracy: 98.87%  Best: 99.15%  Training: 0.079s  Testing: 0.001s
#18  Accuracy: 98.84%  Best: 99.15%  Training: 0.082s  Testing: 0.001s
#19  Accuracy: 99.04%  Best: 99.15%  Training: 0.081s  Testing: 0.001s
#20  Accuracy: 98.80%  Best: 99.15%  Training: 0.080s  Testing: 0.001s
#21  Accuracy: 99.28%  Best: 99.28%  Training: 0.081s  Testing: 0.001s
#22  Accuracy: 99.42%  Best: 99.42%  Training: 0.079s  Testing: 0.001s
#23  Accuracy: 98.94%  Best: 99.42%  Training: 0.079s  Testing: 0.001s
#24  Accuracy: 99.18%  Best: 99.42%  Training: 0.075s  Testing: 0.001s
#25  Accuracy: 98.84%  Best: 99.42%  Training: 0.080s  Testing: 0.001s
#26  Accuracy: 99.08%  Best: 99.42%  Training: 0.077s  Testing: 0.001s
#27  Accuracy: 98.87%  Best: 99.42%  Training: 0.078s  Testing: 0.001s
#28  Accuracy: 99.69%  Best: 99.69%  Training: 0.077s  Testing: 0.001s
#29  Accuracy: 99.11%  Best: 99.69%  Training: 0.081s  Testing: 0.001s
#30  Accuracy: 99.35%  Best: 99.69%  Training: 0.073s  Testing: 0.001s
#31  Accuracy: 99.56%  Best: 99.69%  Training: 0.074s  Testing: 0.001s
#32  Accuracy: 99.62%  Best: 99.69%  Training: 0.074s  Testing: 0.001s
#33  Accuracy: 99.49%  Best: 99.69%  Training: 0.075s  Testing: 0.001s
#34  Accuracy: 99.69%  Best: 99.69%  Training: 0.075s  Testing: 0.001s
#35  Accuracy: 99.62%  Best: 99.69%  Training: 0.071s  Testing: 0.001s
#36  Accuracy: 99.69%  Best: 99.69%  Training: 0.071s  Testing: 0.001s
#37  Accuracy: 99.62%  Best: 99.69%  Training: 0.071s  Testing: 0.001s
#38  Accuracy: 99.42%  Best: 99.69%  Training: 0.070s  Testing: 0.001s
#39  Accuracy: 99.52%  Best: 99.69%  Training: 0.071s  Testing: 0.001s
#40  Accuracy: 99.32%  Best: 99.69%  Training: 0.072s  Testing: 0.001s
#41  Accuracy: 99.62%  Best: 99.69%  Training: 0.067s  Testing: 0.001s
#42  Accuracy: 99.69%  Best: 99.69%  Training: 0.070s  Testing: 0.001s
#43  Accuracy: 99.69%  Best: 99.69%  Training: 0.072s  Testing: 0.001s
#44  Accuracy: 99.49%  Best: 99.69%  Training: 0.066s  Testing: 0.001s
#45  Accuracy: 99.62%  Best: 99.69%  Training: 0.072s  Testing: 0.001s
#46  Accuracy: 99.62%  Best: 99.69%  Training: 0.070s  Testing: 0.001s
#47  Accuracy: 99.59%  Best: 99.69%  Training: 0.069s  Testing: 0.001s
#48  Accuracy: 99.69%  Best: 99.69%  Training: 0.069s  Testing: 0.001s
#49  Accuracy: 99.42%  Best: 99.69%  Training: 0.067s  Testing: 0.001s
#50  Accuracy: 99.69%  Best: 99.69%  Training: 0.070s  Testing: 0.001s
#51  Accuracy: 99.42%  Best: 99.69%  Training: 0.067s  Testing: 0.001s
#52  Accuracy: 99.69%  Best: 99.69%  Training: 0.068s  Testing: 0.001s
#53  Accuracy: 99.69%  Best: 99.69%  Training: 0.074s  Testing: 0.001s
#54  Accuracy: 99.69%  Best: 99.69%  Training: 0.070s  Testing: 0.001s
#55  Accuracy: 99.69%  Best: 99.69%  Training: 0.068s  Testing: 0.001s
#56  Accuracy: 99.69%  Best: 99.69%  Training: 0.069s  Testing: 0.001s
#57  Accuracy: 99.69%  Best: 99.69%  Training: 0.075s  Testing: 0.001s
#58  Accuracy: 99.69%  Best: 99.69%  Training: 0.067s  Testing: 0.001s
#59  Accuracy: 99.69%  Best: 99.69%  Training: 0.065s  Testing: 0.001s
#60  Accuracy: 99.69%  Best: 99.69%  Training: 0.073s  Testing: 0.001s

These results were obtained on a CPU, and it works quite fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants