-
Notifications
You must be signed in to change notification settings - Fork 7
Home
# import the dataset class
from d_cube import D3
# init a dataset instance
d3 = D3(IMG_ROOT, PKL_ANNO_PATH)
all_img_ids = d3.get_img_ids() # get the image ids in the dataset
all_img_info = d3.load_imgs(all_img_ids) # load images by passing a list containing some image ids
img_path = all_img_info[0]["file_name"] # obtain one image path so you can load it and inference
# then you can load the image as input for your model
group_ids = d3.get_group_ids(img_ids=[img_id]) # get the group ids by passing anno ids, image ids, etc.
sent_ids = d3.get_sent_ids(group_ids=group_ids) # get the sentence ids by passing image ids, group ids, etc.
sent_list = d3.load_sents(sent_ids=sent_ids)
ref_list = [sent['raw_sent'] for sent in sent_list]
# use these language references in `ref_list` as the references to your REC/OVD/DOD model
# save the result to a JSON file
Concepts and structures of anno
, image
, sent
and group
are explained in this part.
In this directory we provide the inference (and evaluation) script on some existing SOTA OVD/REC methods.
When the inference is done, you need to save a JSON file in the format below (COCO standard output JSON form):
[
{
"category_id": "int, the value of sent_id, range [1, 422]",
"bbox": "[x1, y1, w, h], predicted by your model, same as COCO result format",
"image_id": "int, img_id, can be 0, 1, 2, ....",
"score": "float, predicted by your model, no restriction on its absolute value range"
}
]
This JSON file should contain a list, where each item in the list is a dictionary of one detection result.
With this JSON saved, you can evaluate the JSON in the next step. See the evaluation step.
The default evaluation protocol is the intra-group setting, where only a certain references are considerred for each image. Inter-group setting, where all references in the dataset are considerred for each image, can be easily achieved by changing sent_ids = d3.get_sent_ids(group_ids=group_ids)
to sent_ids = d3.get_sent_ids()
. This will use all the sentences in the dataset, rather than a few sentences in the group that this image belongs to.
A Python dictionary where the keys are integers and the values are dictionaries with the following key-value pairs:
-
id
: an integer representing the ID of the annotation. -
sent_id
: a list of integers representing the IDs of sentences associated with this annotation. -
segmentation
: a Run Length Encoding (RLE) representation of the annotation. -
area
: an integer representing the area of the annotation. -
iscrowd
: an integer indicating whether this annotation represents a crowd or not. -
image_id
: an integer representing the ID of the image associated with this annotation. -
bbox
: a list of four integers representing the bounding box coordinates of the annotation in the format [x, y, width, height]. -
group_id
: a value that can be any object and represents the ID of the group associated with this annotation.
{
1 : {
"id": int,
"sent_id": list,
"segmentation": RLE,
"area": int,
"iscrowd": int,
"image_id": int,
"bbox": list, # [x, y, width, height]
"group_id": int
}
}
A Python dictionary where the keys are integers and the values are dictionaries with the following key-value pairs:
-
id
: an integer representing the ID of the image. -
file_name
: a string representing the file name of the image. -
height
: an integer representing the height of the image. -
width
: an integer representing the width of the image. -
flickr_url
: a string representing the Flickr URL of the image. -
anno_id
: a list of integers representing the IDs of annotations associated with this image. -
group_id
: an integer representing the ID of the group associated with this image. -
license
: a string representing the license of the image.
{
int : {
"id": int,
"file_name": str,
"height": int,
"width": int,
"flickr_url": str,
"anno_id": list,
"group_id": int,
"license": str,
}
}
A Python dictionary where the keys are integers and the values are dictionaries with the following key-value pairs:
-
id
: an integer representing the ID of the sentence. -
anno_id
: a list of integers representing the IDs of annotations associated with this sentence. -
group_id
: a list of integers representing the IDs of groups associated with this sentence. -
is_negative
: a boolean indicating whether this sentence is anti-expression or not. -
raw_sent
: a string representing the raw text of the sentence in English. -
raw_sent_zh
: a string representing the raw text of the sentence in Chinese.
{
int : {
"id": int,
"anno_id": list,
"group_id": list,
"is_negative": bool,
"raw_sent": str,
"raw_sent_zh": str
}
}
A Python dictionary where the keys are integers and the values are dictionaries with the following key-value pairs:
-
id
: an integer representing the ID of the group. -
pos_sent_id
: a list of integers representing the IDs of sentences that has referred obejct in the group. -
inner_sent_id
: a list of integers representing the IDs of sentences belonging to this group. -
outer_sent_id
: a list of integers representing the IDs of outer-group sentences that has referred obejct in the group. -
img_id
: a list of integers representing the IDs of images of this group. -
scene
: a list of strings representing the scenes of this group. -
group_name
: a string representing the name of this group in English. -
group_name_zh
: a string representing the name of this group in Chinese.
{
int : {
"id": int,
"pos_sent_id": list,
"inner_sent_id": list,
"outer_sent_id": list,
"img_id": list,
"scene": list,
"group_name": str,
"group_name_zh": str
}
}
In this part, we introduce how to evaluate the performance and get the metric values given the prediction result of a JSON file.
This is based on cocoapi (pycocotools), and is quite simple:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
# Eval results
coco = COCO(gt_path) # `gt_path` is the ground-truth JSON path (different JSON for FULL, PRES or ABS settings in our paper)
d3_model = coco.loadRes(pred_path) # `pred_path` is the prediction JSON file
cocoEval = COCOeval(coco, d3_model, "bbox")
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
We also provide a script that can produce the evaluation results (and some additional analysis) in our paper, given a prediction JSON. You can use it by:
python eval_and_analysis_json.py YOUR_PREDICTION_JSON_PATH
A few options are provided for format conversion or more analysis:
python eval_and_analysis_json.py --help
usage: An example script for $D^3$ evaluation with prediction file (JSON) [-h] [--partition-by-nbox] [--partition-by-lens] [--xyxy2xywh] pred_path
positional arguments:
pred_path path to prediction json
optional arguments:
-h, --help show this help message and exit
--partition-by-nbox divide the images by num of boxes for each ref
--partition-by-lens divide the references by their lengths
--xyxy2xywh transform box coords from xyxy to xywh
See this directory for details. More scripts for evaluating popular SOTA OVD/REC/other methods on
A python script is provided for calculating the statistics of