A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

This is the benchmark, code, and configuration accompanying the EMNLP-Findings 2023 paper A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs. The main branch holds code/information about the benchmark itself. The following branches hold code and configuration for the separate models evaluated in the study.

Download Benchmark

mkdir data
cd data
curl -O https://madata.bib.uni-mannheim.de/424/2/wikidata5m-si.tar.gz
tar -zxvf wikidata5m-si.tar.gz

Benchmark Content

All files are tab separated.

entity_ids.del
- maps ids used in all files to Wikidata IDs
- first column entity id, second column Wikidata entity id
entity_mentions.del
- maps entity ids to entity mentions
entity_desc.del
- maps entity ids to entity descriptions
relation_ids.del
- maps relation ids Wikidata relation ids
- first column relation id, second column Wikidata relation id
relation_mentions.del
- maps relation ids to relation mentions
train.del
- contains training triples in the form of subject, relation, object

Transductive

valid.del
- contains transductive validation triples in the form of subject, relation, object
test.del
- contains transductive validation triples in the form of subject, relation, object

Semi-Inductive

all_entity_ids.del
- contains ids from entity_ids.del and additionally all ids of unseen entities
all_entity_mentions.del
- contains mentions from entity_mentions.del and additionally all mentions of unseen entities
all_entity_desc.del
- contains descriptions from entity_desc.del and additionally all descriptions of unseen entities
valid_pool.del
- contains all triples used for semi-inductive validation
- columns
  - 1: unseen entity id
  - 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
  - 3-5: validation triple
    - 3: subject
    - 4: relation
    - 5: object
- use prepare_few_shot.py to create all semi-inductive tasks from this file
test_pool.del
- contains all triples used for semi-inductive testing
- columns
  - 1: unseen entity id
  - 2: slot of unseen entity (0: unseen entity is in subject slot, 1: unseen entity in object slot)
  - 3-5: test triple
    - 3: subject
    - 4: relation
    - 5: object
- tab separated
- use prepare_few_shot.py to create all semi-inductive tasks from this file

Generate Few Shot Tasks

use the file prepare_few_shot.py
create a few_shot_set_creator object
- dataset_name: (str) name of the dataset
- default: wikidata5m_v3_semi_inductive
- use_invese: (bool) whether to use inverse relations
  - default: False
    - if True: for all triples where the unseen entity is in the object slot, increase relation id by num-relations and invert triple
- split: (str) which split to use - default: valid
- context_selection: (str) which context_selection technique to use - default: most_common - options: most_common, least_common, random

few_shot_set_creator = FewShotSetCreator(
	dataset_name="wikidata5m_v3_semi_inductive",
	use_inverse=True,
	split="test"
)

generate the data using the few_shot_set_creator
- num_shots: (int) the number of shots to use (between 0 and 10)

data = few_shot_set_creator.create_few_shot_dataset(num_shots=5)

evaluation is performed in direction unseen to seen
output format looks like this

[
{
	"unseen_entity": <id of unseen entity>,
	"unseen_slot": <slot of unseen entity: 0 for head/subject, 2 for tail/object>,
	"triple: <[s, p, o]>,
	"context: <[unseen_entity_id, unseen_entity_slot, s, p, o]>
},
...

]

Create Benchmarks Based on Other Graphs

to create similar benchmark based on other graphs use the file create_semi_inductive_dataset.py
this file was used to create wikidata5m-si based on wikidata5m

How to Cite

if you use the proposed benchmark, the provided code or insights presented in the paper please cite.

@inproceedings{kochsiek2023benchmark,                                                                                                                                                                  
title={A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs},
author={Kochsiek, Adrian and Gemulla, Rainer},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
create_semi_inductive_dataset.py		create_semi_inductive_dataset.py
prepare_few_shot.py		prepare_few_shot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

Download Benchmark

Benchmark Content

Transductive

Semi-Inductive

Generate Few Shot Tasks

Create Benchmarks Based on Other Graphs

How to Cite

About

Releases

Packages

Contributors 2

Languages

uma-pi1/wikidata5m-si

Folders and files

Latest commit

History

Repository files navigation

A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs

Download Benchmark

Benchmark Content

Transductive

Semi-Inductive

Generate Few Shot Tasks

Create Benchmarks Based on Other Graphs

How to Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages