Skip to content

Latest commit

 

History

History
executable file
·
42 lines (24 loc) · 2.13 KB

README.md

File metadata and controls

executable file
·
42 lines (24 loc) · 2.13 KB

A pytorch implement for CIKM 2020 paper ''Learning Better Representations for Neural Information Retrieval with Graph Information'', namely Embedding-based neural ranker (EmbRanker) and Aggregation-based neural ranker (AggRanker).

Requirement

  • Python 2.7
  • Pytorch 0.4.1
  • tqdm
  • networkx 2.1

Dataset

We run experiment on the publicly available dataset Tiangong-ST, which is a Chinese search log from Sogou.com.

  • Preprocessed data should be placed in ./sessionST/dataset/, following the settings in config.py.

  • Sampled files are given in valid/test folders. Each line consists of qid docid query title TACM PSCM THCM UBM DBN POM HUMAN(Only available in test set), separated by TAB. In particular, TACM PSCM THCM UBM DBN POM are the click labels given in the dataset.

  • Building the graph data from session data requires networkx and cPickle. The graph data is stored as pkl file. Demo processing code is shown in build_graph.py.

  • (Update) Run ./EmbRanker/data/convert2textdict.py to create vocab_dict_file and embedding dict emb file. Embedding is downloaded from here .

Besides, constructing the training-specific graph data in both models are different:

  • EmbRanker: run path_generator.py based on pkl graph data to get positive and negative samples.
  • AggRanker: run ./data/neighbor_generator.py based on pkl graph data to get neighbors of each center nodes.

Baselines

The baseline code is released through our PyTorch implementation.

  1. VPCG is " Learning Query and Document Relevance from a Web-scale Click Graph " (SIGIR 2016)
  2. GEPS is "Neural IR Meets Graph Embedding: A Ranking Model for Product Search" (WWW 2019)

Procedure

  1. All the settings are in config.py.
  2. run python main.py --prototype train_config -e ACRI --gpu 0

If you have any problems, please contact me via [email protected].