Skip to content

AdrienBenamira/Link_prediction_graph

Repository files navigation

General

In this work we address the task of link prediction in a citation network. This work is also a part of an in-class Kaggle Competition for Network Course Analytics Course offered at Ecole CentraleSupelec, Paris in Fall 2018-2019.

Our final F-score is 0.973 on the public test set and we are currently ranked 2nd / 46.

Set up

Put glove folder in the dataset path Config default

run main.py

Features

We have :

* overlap_title,
* temp_diff,
* comm_auth,
* num_inc_edges,
* Distance_abstract,
* Distance_title,
* shortest_path_dijkstra
* shortest_path_dijkstra_und
* comm_neighbors,
* no_edge,
* tfidf_distance_corpus,
* tfidf_distance_titles,
* jaccard_und
* Resource_allocation

Results and report

Report is available here

Model Train Validation
Gradient Boosting 0.979 0.976
Random Forest 1 0.975
SVM 0.964 0.964
Linear 0.966 0.966

Extra :

Model_tunning.ipynb and Features.ipynb analyse our results

Alt text

Alt text

Alt text

Alt text

About

Kaggle Competition: Predicting Missing Citations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published