Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

How to train on a new knowledge base? #84

Open
amirj opened this issue Jun 7, 2021 · 8 comments
Open

How to train on a new knowledge base? #84

amirj opened this issue Jun 7, 2021 · 8 comments

Comments

@amirj
Copy link

amirj commented Jun 7, 2021

It seems that a lot of people asked for training BLINK for a new knowledge base (i.e. a set of entities + descriptions), but unfortunately I couldn't find relevant information.

May I ask you to add just a quick guide here please?

@Giovani-Merlin
Copy link

Giovani-Merlin commented Jan 2, 2022

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

@driscoll42
Copy link

@Giovani-Merlin It seems those tutorial links you posted are no longer working, could you repost them?

@abhinavkulkarni
Copy link

@amirj: You can look at this tutorial #116

@kongmoumou
Copy link

kongmoumou commented Nov 20, 2022

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

The link seems to be 404, could u please update to the right link @Giovani-Merlin . Thx a lot~

@viraj-lakshitha
Copy link

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

@Giovani-Merlin : Can you provide access to the mentioned repository ?

@gromajus
Copy link

gromajus commented Feb 3, 2023

@Giovani-Merlin I would be also very grateful for the access to your tutorial:)

@Giovani-Merlin
Copy link

@viraj-lakshitha @gromajus @kongmoumou @driscoll42
Hello! Sorry, a bit late, but I needed to make considerable changes in the tutorials/repo as I was unsatisfied with the final results.
I've split the repo into two parts:

WBDSM for creating the dataset (for any Wikipedia dump in any language)
https://github.com/Giovani-Merlin/wbdsm for creating the dataset

Bet for training bi-encoder models:
https://github.com/Giovani-Merlin/bet

The results are fantastic. You can follow this process illustrated here https://github.com/Giovani-Merlin/bet/blob/main/docs/results.md to train a custom model or to benchmark with Zeshel dataset.

If you have any doubts/issues please use the respective repo issues part :)
Later on I will improve the tutorials/documentation

@driscoll42
Copy link

I won't have time for a few weeks, but I will definitely give this a shot. Thanks for updating it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants