Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conversion code to the BOP format #12

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
dcd358f
add script to convert hot3d to the BOP format
AnasIbrahim Jul 29, 2024
65b5477
BOP format: add object masks
AnasIbrahim Jul 30, 2024
ae2cb79
BOP format: add scene_gt_info.json
AnasIbrahim Jul 30, 2024
07bd775
BOP format: ignore annotations not in frame - probably also not in sc…
AnasIbrahim Jul 30, 2024
0671c6c
BOP format: add whole calibration in cam_model, flatten rotation matr…
AnasIbrahim Jul 31, 2024
dcec01b
BOP format: generate annotations for all camera streams
AnasIbrahim Jul 31, 2024
c0d3f3e
BOP format: Translation in mm. Add device and image_timestamp_ns to s…
AnasIbrahim Aug 1, 2024
3a0d9b8
BOP format: add script to convert models from glb to ply with texture
AnasIbrahim Aug 1, 2024
889b91c
BOP format: model conversion save average color as the texture map fo…
AnasIbrahim Aug 1, 2024
0d900d8
BOP format: add script to convert eval models to bop format. Another …
AnasIbrahim Aug 1, 2024
43528b5
BOP format: change key obj_id in scene_gt to int. Fix model conversio…
AnasIbrahim Aug 7, 2024
45d2d34
BOP format: Add threading. Clean up code.
AnasIbrahim Aug 8, 2024
f6d7328
BOP format: fix object 6d annotations in scene_gt.json
AnasIbrahim Aug 8, 2024
ffaabbf
BOP format: fix object 6d annotations
AnasIbrahim Aug 13, 2024
717aab6
Merge branch 'main' of github.com:AnasIbrahim/hot3d into main
AnasIbrahim Aug 13, 2024
fd9529f
BOP format: read modal and amodal masks for latest dataset
AnasIbrahim Aug 27, 2024
0eb71e1
BOP format: minor changes
AnasIbrahim Aug 27, 2024
afc4b6f
BOP format: calc bbox with bop toolkit
AnasIbrahim Aug 27, 2024
adb82a3
BOP format: add dummy transformations (-1 for all) for objects that a…
AnasIbrahim Aug 27, 2024
de42867
BOP format: fix bbox
AnasIbrahim Aug 28, 2024
00a098c
BOP format: generate empty mask and mask_visib when object is not there
AnasIbrahim Aug 28, 2024
c5b9f44
Update hot3d_to_bop.py
nv-nguyen Aug 28, 2024
149006d
BOP format: restructure folders
AnasIbrahim Sep 10, 2024
9c1c39e
Merge branch 'main' of github.com:AnasIbrahim/hot3d into main
AnasIbrahim Sep 10, 2024
c3771d9
BOP format: add README.md, clean argparse params
AnasIbrahim Sep 10, 2024
a909f68
BOP format: generate masks even if RLE annotation is empty. Add license
AnasIbrahim Sep 12, 2024
6f92f59
BOP format: change argparse params to easily automate the conversion …
AnasIbrahim Sep 16, 2024
de7666d
BOP format: minor change
AnasIbrahim Sep 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions hot3d/clips/bop_format_converters/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

# Scripts to convert HOT3D dataset from its native format to the BOP format.

### Set Environment Variables

Before running the scripts, set the following environment variables:

- `HOT3D_DIR`: Path to the HOT3D dataset directory. Converted data will be saved to the same folder.

```bash
export HOT3D_DIR=<PATH_TO_HOT3D_DATASET>
```

### Convert the object models from HOT3D format to BOP format

To convert (full) models:

```bash
python hot3d_models_to_bop.py --input-gltf-dir $HOT3D_DIR/object_models --output-bop-dir $HOT3D_DIR/models
```

To convert eval models:

```bash
python hot3d_models_eval_to_bop.py --input-gltf-dir $HOT3D_DIR/object_models_eval --output-bop-dir $HOT3D_DIR/models_eval
```

Copy the models info from both models and models_eval to the same directory:

```bash
cp $HOT3D_DIR/object_models/models_info.json $HOT3D_DIR/models/models_info.json
cp $HOT3D_DIR/object_models_eval/models_info.json $HOT3D_DIR/models_eval/models_info.json
```

### Convert HOT3D clips to BOP format

To convert HOT3D clips to BOP format, run the following command:

Parameters:
- --split: Options are "train_aria", "train_quest3", "test_aria", or "test_quest3"
- --num-threads: Optional, with a default of 4. You can use 4 or 8 threads for better performance.

```bash
# converted data to be saved to $HOT3D_DIR/<SPLIT_NAME>_scenewise
python hot3d_clips_to_bop_scenewise.py \
--hot3d-dataset-path $HOT3D_DIR \
--split <SPLIT_NAME> \
--num-threads <NUM_THREADS>
```
349 changes: 349 additions & 0 deletions hot3d/clips/bop_format_converters/hot3d_clips_to_bop_scenewise.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,349 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All files must have a header related to licensing of the project

# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the license. I also fixed an existing problem with the generation of out-of-frame-scope masks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AnasIbrahim, I will fix the linting issue.

This script converts the Hot3D-Clips dataset used for the BOP challenge to the BOP format.
NOTE: the BOP format was updated from its classical format to a new format.
The classical format had one main modality (rgb or gray) and depth.
The new format can have multiple modalities (rgb, gray1, gray2) and no depth.
"""

import os
import argparse
import json
import cv2
import numpy as np
from PIL import Image
import tarfile
from typing import Any, Dict, List, Optional
import trimesh
from tqdm import tqdm
from scipy.spatial.transform import Rotation as R
import multiprocessing
from bop_toolkit_lib import misc

import sys
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
sys.path.insert(0, parent_dir)
import clip_util


def main():
# setup args
parser = argparse.ArgumentParser()
parser.add_argument("--hot3d-dataset-path", required=True ,type=str)
# BOP dataset split name
parser.add_argument("--split", required=True, type=str)
# number of threads
parser.add_argument("--num-threads", type=int, default=4)

args = parser.parse_args()

# if split contains "quest3"
if "quest3" in args.split:
args.camera_streams_id = ["1201-1", "1201-2"]
args.camera_streams_names = ["gray1", "gray2"]
elif "aria" in args.split:
args.camera_streams_id = ["214-1", "1201-1", "1201-2"]
args.camera_streams_names = ["rgb", "gray1", "gray2"]
else:
print("Split is neither quest3 nor aria.\n"
"There are only 4 split type in Hot3D: train_quest3, test_quest3, train_aria, test_aria.")
exit()

# paths
clips_input_dir = os.path.join(args.hot3d_dataset_path, args.split)
scenes_output_dir = os.path.join(args.hot3d_dataset_path, args.split+"_scenewise")

# list all clips names in the dataset
split_clips = sorted([p for p in os.listdir(clips_input_dir) if p.endswith(".tar")])

# create output directory
os.makedirs(scenes_output_dir, exist_ok=False)

# Progress bar setup
with tqdm(total=len(split_clips), desc="Processing clips") as pbar:
# Use a Pool of 8 processes
with multiprocessing.Pool(processes=args.num_threads) as pool:
# Use imap_unordered to get results as soon as they're ready
for _ in pool.imap_unordered(worker, ((clip, clips_input_dir, scenes_output_dir, args) for clip in split_clips)):
pbar.update(1)


def worker(args):
clip, clips_input_dir, scenes_output_dir, args = args
process_clip(clip, clips_input_dir, scenes_output_dir, args)


def process_clip(clip, clips_input_dir, scenes_output_dir, args):
# get clip id
clip_name = clip.split(".")[0].split("-")[1]

# extract clip
tar = tarfile.open(os.path.join(clips_input_dir, clip), "r")

# make scene folder and files for the scene
scene_output_dir = os.path.join(scenes_output_dir, clip_name)
os.makedirs(scene_output_dir, exist_ok=True)

# make path of folders and folders
# eg: STREAM_NAME, mask_STREAM_NAME, mask_visib_STREAM_NAME
# also create path for each json file
# eg: scene_camera_STREAM_NAME.json, scene_gt_STREAM_NAME.json, scene_gt_info_STREAM_NAME.json
# create a dictionary for all camera streams
clip_stream_paths = {}
for stream_name in args.camera_streams_names:
# directories
stream_image_dir = os.path.join(scene_output_dir, stream_name)
os.makedirs(stream_image_dir, exist_ok=True)
clip_stream_paths[stream_name] = stream_image_dir
stream_mask_dir = os.path.join(scene_output_dir, f"mask_{stream_name}")
os.makedirs(stream_mask_dir, exist_ok=True)
clip_stream_paths[f"mask_{stream_name}"] = stream_mask_dir
stream_mask_visib_dir = os.path.join(scene_output_dir, f"mask_visib_{stream_name}")
os.makedirs(stream_mask_visib_dir, exist_ok=True)
clip_stream_paths[f"mask_visib_{stream_name}"] = stream_mask_visib_dir
# json files
stream_scene_camera_json_path = os.path.join(scene_output_dir, f"scene_camera_{stream_name}.json")
clip_stream_paths[f"scene_camera_{stream_name}"] = stream_scene_camera_json_path
stream_scene_gt_json_path = os.path.join(scene_output_dir, f"scene_gt_{stream_name}.json")
clip_stream_paths[f"scene_gt_{stream_name}"] = stream_scene_gt_json_path
stream_scene_gt_info_json_path = os.path.join(scene_output_dir, f"scene_gt_info_{stream_name}.json")
clip_stream_paths[f"scene_gt_info_{stream_name}"] = stream_scene_gt_info_json_path

# make a dict of dicts with stream name as keys
scene_camera_data = {}
scene_gt_data = {}
scene_gt_info_data = {}
for stream_name in args.camera_streams_names:
# add an empty dict indicating the stream name
scene_camera_data[stream_name] = {}
scene_gt_data[stream_name] = {}
scene_gt_info_data[stream_name] = {}

# loop over all frames
for frame_id in range(clip_util.get_number_of_frames(tar)):
frame_key = f"{frame_id:06d}"

# Load camera parameters.
# from FRAME_ID.cameras.json
frame_camera = clip_util.load_cameras(tar, frame_key)
## read FRAME_ID.objects.json
frame_objects = clip_util.load_object_annotations(tar, frame_key)

# read calibration json as it is
camera_json_file_name = f"{frame_id:06d}.cameras.json"
camera_json_file = tar.extractfile(camera_json_file_name)
frame_camera_data = json.load(camera_json_file)

# read FRAME_ID.info.json
frame_info_file_name = f"{frame_id:06d}.info.json"
frame_info_file = tar.extractfile(frame_info_file_name)
frame_info_data = json.load(frame_info_file)

# loop over all camera streams
for stream_index, stream_name in enumerate(args.camera_streams_names):
stream_id = args.camera_streams_id[stream_index]

# load the image corresponding to the stream and frame
image = clip_util.load_image(tar, frame_key, stream_id)
# if image is rgb (3 channels), convert to BGR
if image.ndim == 3:
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# save the image
image_path = os.path.join(clip_stream_paths[stream_name], frame_key+".jpg")
cv2.imwrite(image_path, image)

# filling scene_camera.json

# get T_world_from_camera
T_world_from_camera = frame_camera[stream_id].T_world_from_eye

T_world_to_camera = np.linalg.inv(T_world_from_camera)

# get camera parameters
calibration = frame_camera_data[stream_id]["calibration"]

# add frame scene_camera data
scene_camera_data[stream_name][int(frame_id)] = {
"cam_model": calibration,
"device": frame_info_data["device"],
"image_timestamps_ns": frame_info_data["image_timestamps_ns"][stream_id],
#"cam_K": # not used as cam_model exists
#"depth_scale": # also not used
# convert translation from meter to mm
"cam_R_w2c": T_world_to_camera[:3, :3].flatten().tolist(),
"cam_t_w2c": (T_world_to_camera[:3, 3] * 1000).tolist(),
}

# Camera parameters of the current image.
#camera_model = frame_camera[stream_id]

frame_scene_gt_data = []
frame_scene_gt_info_data = []
# loop with enumerate over all objects in the frame
for anno_id, obj_key in enumerate(frame_objects):
obj_data = frame_objects[obj_key][0]

# set objects that are not in the current frame scope to -1 (they probably are visible in other frames)
# check this by 2 cases
# 1) check if the object is visible in the current stream - stream id in keys of visibilities_modeled
# 2) if the RLE mask (list) is empty - this happens with objects with very low visibility (< 0.001)
if stream_id not in obj_data["visibilities_modeled"] \
or not obj_data["masks_amodal"][stream_id]["rle"]:
# make dummy translation and rotation of -1 for all values
object_frame_scene_gt_anno = {
"obj_id": int(obj_key),
"cam_R_m2c": [-1, -1, -1, -1, -1, -1, -1, -1, -1],
"cam_t_m2c": [-1, -1, -1],
}
object_frame_scene_gt_info_anno = {
"bbox_obj": [-1, -1, -1, -1],
"bbox_visib": [-1, -1, -1, -1],
"px_count_all": 0,
#"px_count_valid": px_count_all, # excluded as Hot3D is RGB only - TODO check
"px_count_visib": 0,
"visib_fract": 0,
}
# make an empty mask and mask_visib
width = frame_camera_data[stream_id]["calibration"]["image_width"]
height = frame_camera_data[stream_id]["calibration"]["image_height"]
mask = Image.new("L", (width, height), 0)
mask_visib = Image.new("L", (width, height), 0)
else:
#bop_id = int(obj_data["object_bop_id"]) # same as obj_key

# Transformation from the model to the world space.
T_world_from_model = clip_util.se3_from_dict(obj_data["T_world_from_object"])

# get object pose in camera frame
T_camera_from_model = np.linalg.inv(T_world_from_camera) @ T_world_from_model

object_frame_scene_gt_anno = {
"obj_id": int(obj_key),
"cam_R_m2c": T_camera_from_model[:3, :3].flatten().tolist(),
"cam_t_m2c": (T_camera_from_model[:3, 3] * 1000).tolist(),
}

# read amodal masks
rle_dict = obj_data['masks_amodal'][stream_id]
if not rle_dict['rle']:
# if 'rle' is an empty list, continue to the next object
print("RLE mask is empty!",
"For scene_id:{}, frame_id: {}, obj_id: {}.".format(clip_name, frame_id, obj_key),
"This case shouldn't happen. Maybe that is an edge case That is not covered here.",
"The process will exit.")
exit()
else:
mask = custom_rle_to_mask(rle_dict['height'], rle_dict['width'], rle_dict['rle'])
mask = Image.fromarray(mask * 255)
mask = mask.convert("L")

# read modal mask
rle_dict = obj_data['masks_modal'][stream_id]
# if 'rle' is an empty list, make an empty mask
if not rle_dict['rle']:
mask_visib = Image.new("L", (rle_dict['width'], rle_dict['height']), 0)
else:
mask_visib = custom_rle_to_mask(rle_dict['height'], rle_dict['width'], rle_dict['rle'])
mask_visib = Image.fromarray(mask_visib * 255)
mask_visib = mask_visib.convert("L")

px_count_all = cv2.countNonZero(np.array(mask))
px_count_visib = cv2.countNonZero(np.array(mask_visib))
# visibile fraction
visibilities_modeled = obj_data['visibilities_modeled'][stream_id]
visibilities_predicted = obj_data['visibilities_predicted'][stream_id]
visib_fract = min(visibilities_modeled, visibilities_predicted)

bbox_obj = obj_data['boxes_amodal'][stream_id]
# change bbox fro xyxy to xywh
bbox_obj = [bbox_obj[0], bbox_obj[1], bbox_obj[2]-bbox_obj[0], bbox_obj[3]-bbox_obj[1]]
bbox_obj = [int(val) for val in bbox_obj]
# bbox_visib
if px_count_visib > 0:
ys, xs = np.asarray(mask_visib).nonzero()
im_size = mask_visib.size
bbox_visib = misc.calc_2d_bbox(xs, ys, im_size)
bbox_visib = [int(x) for x in bbox_visib]
else:
bbox_visib = [-1, -1, -1, -1]
# add scene_gt_info data
object_frame_scene_gt_info_anno = {
"bbox_obj": bbox_obj,
"bbox_visib": bbox_visib,
"px_count_all": px_count_all,
#"px_count_valid": px_count_all, # excluded as Hot3D is RGB only - TODO check
"px_count_visib": px_count_visib,
"visib_fract": visib_fract,
}

anno_id = f"{anno_id:06d}"
# save mask FRAME-ID_ANNO-ID.png
mask_path = os.path.join(clip_stream_paths[f"mask_{stream_name}"], frame_key+"_"+anno_id+".png")
# save mask
mask.save(mask_path)
# save mask_visib FRAME-ID_ANNO-ID.png
mask_visib_path = os.path.join(clip_stream_paths[f"mask_visib_{stream_name}"], frame_key+"_"+anno_id+".png")
# save mask_visib
mask_visib.save(mask_visib_path)

frame_scene_gt_data.append(object_frame_scene_gt_anno)
frame_scene_gt_info_data.append(object_frame_scene_gt_info_anno)

scene_gt_data[stream_name][int(frame_id)] = frame_scene_gt_data
scene_gt_info_data[stream_name][int(frame_id)] = frame_scene_gt_info_data

# save scene_gt.json, scene_gt_info.json, scene_camera.json for each camera stream
for stream_name in args.camera_streams_names:
with open(clip_stream_paths[f"scene_camera_{stream_name}"], "w") as f:
json.dump(scene_camera_data[stream_name], f, indent=4)
with open(clip_stream_paths[f"scene_gt_{stream_name}"], "w") as f:
json.dump(scene_gt_data[stream_name], f, indent=4)
with open(clip_stream_paths[f"scene_gt_info_{stream_name}"], "w") as f:
json.dump(scene_gt_info_data[stream_name], f, indent=4)


def custom_rle_to_mask(height, width, rle):
"""
Convert custom RLE (Run-Length Encoding) to a binary mask using vectorized operations.

Parameters:
- height (int): The height of the mask.
- width (int): The width of the mask.
- rle (list): The custom RLE list [start, length, start, length, ...].

Returns:
- np.ndarray: The binary mask.
"""
# Create an empty mask
mask = np.zeros(height * width, dtype=np.uint8)

# Convert RLE pairs into start and end indices
starts = np.array(rle[0::2])
lengths = np.array(rle[1::2])
ends = starts + lengths

# Create an array of indices corresponding to the runs
run_lengths = np.concatenate([np.arange(start, end) for start, end in zip(starts, ends)])

# Set those indices in the mask to 1
mask[run_lengths] = 1

# Reshape the flat array into a 2D mask
return mask.reshape((height, width))


if __name__ == "__main__":
main()
Loading