GitHub - lixsh6/GraRetrieval-CIKM2020

A pytorch implement for CIKM 2020 paper ''Learning Better Representations for Neural Information Retrieval with Graph Information'', namely Embedding-based neural ranker (EmbRanker) and Aggregation-based neural ranker (AggRanker).

Requirement

We run experiment on the publicly available dataset Tiangong-ST, which is a Chinese search log from Sogou.com.

Preprocessed data should be placed in ./sessionST/dataset/, following the settings in config.py.
Sampled files are given in valid/test folders. Each line consists of qid docid query title TACM PSCM THCM UBM DBN POM HUMAN(Only available in test set), separated by TAB. In particular, TACM PSCM THCM UBM DBN POM are the click labels given in the dataset.
Building the graph data from session data requires networkx and cPickle. The graph data is stored as pkl file. Demo processing code is shown in build_graph.py.
(Update) Run ./EmbRanker/data/convert2textdict.py to create vocab_dict_file and embedding dict emb file. Embedding is downloaded from here .

Besides, constructing the training-specific graph data in both models are different:

EmbRanker: run path_generator.py based on pkl graph data to get positive and negative samples.
AggRanker: run ./data/neighbor_generator.py based on pkl graph data to get neighbors of each center nodes.

The baseline code is released through our PyTorch implementation.

If you have any problems, please contact me via [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
AggRanker		AggRanker
EmbRanker		EmbRanker
baselines		baselines
sessionST		sessionST
README.md		README.md