Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Referring Expressions with COCO, COCO+, and COCOg #5023

Open
dirkgr opened this issue Feb 26, 2021 · 0 comments
Open

Referring Expressions with COCO, COCO+, and COCOg #5023

dirkgr opened this issue Feb 26, 2021 · 0 comments
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo

Comments

@dirkgr
Copy link
Member

dirkgr commented Feb 26, 2021

In the referring expressions task, the model is given an image and an expression, and has to find a bounding box in the image for the thing that the expression refers to.

Here is an example of some images with expressions:

To do this, we need the following components:

  1. A DatasetReader that reads the referring expression data, matches it up with the images, and pre-processes it to produce candidate bounding boxes. The best way to get the referring expressions annotations is from https://github.com/lichengunc/refer, though the code there is out of date, so we'll have to write our own code to read in that data. Other than that, the dataset reader should follow the example of VQAv2Reader. The resulting Instances should consist of the embedded regions of interest from the RegionDetector, the text of one referring expression, in a TextField, and a label field that gives the IoU between the gold annotated region and each predicted region.
  2. A Model that uses VilBERT as a back-end to combine the vision and text data, and gives each region a score. The model computes a loss by taking the softmax of the region scores, and computing the dot product of that with the label field. You might want to look at VqaVilbert to steal some ideas.
  3. A model config that trains this whole thing end-to-end. We're hoping get somewhere near the scores in the VilBERT 12-in-1 paper, though we won't beat the high score since this issue does not cover the extensive multi-task-training work that's covered in the paper.

As always, we recommend you use the AllenNLP Repository Template as a starting point.

@dirkgr dirkgr added Contributions welcome Models Issues related to the allennlp-models repo GSoC hard Difficult tasks labels Feb 26, 2021
@dirkgr dirkgr removed the GSoC label Mar 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo
Projects
None yet
Development

No branches or pull requests

1 participant