A pytorch implement for CIKM 2020 paper ''Learning Better Representations for Neural Information Retrieval with Graph Information'', namely Embedding-based neural ranker (EmbRanker)
and Aggregation-based neural ranker (AggRanker)
.
- Python 2.7
- Pytorch 0.4.1
- tqdm
- networkx 2.1
We run experiment on the publicly available dataset Tiangong-ST, which is a Chinese search log from Sogou.com.
-
Preprocessed data should be placed in
./sessionST/dataset/
, following the settings inconfig.py
. -
Sampled files are given in
valid/test
folders. Each line consists ofqid docid query title TACM PSCM THCM UBM DBN POM HUMAN(Only available in test set)
, separated byTAB
. In particular,TACM PSCM THCM UBM DBN POM
are the click labels given in the dataset. -
Building the graph data from session data requires
networkx
andcPickle
. The graph data is stored aspkl
file. Demo processing code is shown in build_graph.py. -
(Update) Run
./EmbRanker/data/convert2textdict.py
to createvocab_dict_file
and embedding dictemb
file. Embedding is downloaded from here .
Besides, constructing the training-specific graph data in both models are different:
- EmbRanker: run
path_generator.py
based on pkl graph data to get positive and negative samples. - AggRanker: run
./data/neighbor_generator.py
based on pkl graph data to get neighbors of each center nodes.
The baseline code is released through our PyTorch
implementation.
VPCG
is " Learning Query and Document Relevance from a Web-scale Click Graph " (SIGIR 2016)GEPS
is "Neural IR Meets Graph Embedding: A Ranking Model for Product Search" (WWW 2019)
- All the settings are in
config.py
. - run
python main.py --prototype train_config -e ACRI --gpu 0
If you have any problems, please contact me via [email protected]
.