In this work we address the task of link prediction in a citation network. This work is also a part of an in-class Kaggle Competition for Network Course Analytics Course offered at Ecole CentraleSupelec, Paris in Fall 2018-2019.
Our final F-score is 0.973 on the public test set and we are currently ranked 2nd / 46.
Put glove folder in the dataset path Config default
run main.py
We have :
* overlap_title,
* temp_diff,
* comm_auth,
* num_inc_edges,
* Distance_abstract,
* Distance_title,
* shortest_path_dijkstra
* shortest_path_dijkstra_und
* comm_neighbors,
* no_edge,
* tfidf_distance_corpus,
* tfidf_distance_titles,
* jaccard_und
* Resource_allocation
Report is available here
Model | Train | Validation |
---|---|---|
Gradient Boosting | 0.979 | 0.976 |
Random Forest | 1 | 0.975 |
SVM | 0.964 | 0.964 |
Linear | 0.966 | 0.966 |
Model_tunning.ipynb and Features.ipynb analyse our results