物体の三次元姿勢推定 CenterSnap -AE学習用中間ファイルの作成- 【Python】

はじめに
前提条件
中間ファイル作成の準備
中間ファイルの作成
おわりに

はじめに

前回は ShapeNetCore 形式の HDF ファイルを作成する方法について説明しました。

今回は、AutoEncoder 学習用の中間ファイルを作成する方法について説明していきます。

前提条件

前提条件は以下の通りです。

Windows11 (三次元モデルの準備にのみ使用)
Ubuntu22 (モデル準備以降に使用)
Python3.10.x
CloudCompare
open3d == 0.16.0
こちらの記事を参考に三次元モデルを作成していること
シーンの作成が完了していること
こちらの記事を参考に bop_toolkit_lib のインストールとプログラムの修正が完了していること
マスクデータの作成が完了していること
アノテーションデータの作成が完了していること
オブジェクトのモデル情報の作成が完了していること
ShapeNetCore の HDF5 ファイルの作成が完了していること

中間ファイル作成の準備

まず、中間ファイルを出力するためのフォルダを作成していきます。

mkdir -p makeNOCS/output_data/bop_data/lm/CAMERA/train
mv makeNOCS/output_data/bop_data/lm/train_pbr/000000 makeNOCS/output_data/bop_data/lm/CAMERA/train/

CAMERA フォルダに train フォルダを作成し、今までの出力結果をまとめてある 000000 フォルダを train フォルダに移動しました。

mkdir -p makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/coord
mv makeNOCS/output_data/output_nocs/*.hdf5 makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/coord/

output_nocs の結果(hdf5)を coord フォルダに移動します。

また、アノテーションデータを中間ファイルに変換する create_nocs_results.py を作成しておきます。

touch makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/create_nocs_results.py
mkdir -p makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/gts
mkdir -p makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/meta

今作成した create_nocs_results.py を以下のようにしてください。


import json
import numpy as np
import _pickle as cPickle
import os

filepath = "./scene_gt.json"
coco_path = "./scene_gt_coco.json"
gt_info_path = "./scene_gt_info.json"

with open(filepath, "r") as f:
    data = json.load(f)
with open(coco_path, "r") as f:
    coco_data = json.load(f)
with open(gt_info_path, "r") as f:
    info_data = json.load(f)

os.makedirs("gts", exist_ok=True)
os.makedirs("meta", exist_ok=True)

coco_anno = coco_data["annotations"]
RT_dict = {}
k = 0

for index in data:
    RT_temp = np.empty((4,4))
    i = 0
    j = 0
    class_ids_list = []
    bbox_list = []
    for d in data[str(index)]:
        
        bbox_info = info_data[str(index)][i]
        if bbox_info["visib_fract"] != 0:
        
            R_ = d["cam_R_m2c"]
            T_ = d["cam_t_m2c"]
            R = np.resize(np.array(R_), (3, 3))
            T = np.array(T_) / 1000.0
            RT = np.concatenate((R, T[:, None]), axis=1)
            RT = np.append(RT, [[0.0, 0.0, 0.0, 1.0]], axis=0)
            RT_temp = np.append(RT_temp, RT, axis=0)

            class_id = d["obj_id"]
            class_ids_list.append(class_id)
            
            bbox_list.append(coco_anno[j]["bbox"])

            j += 1
        
        i += 1
        
    gt_class_ids = np.array(class_ids_list, dtype="int64")
    gt_bboxes = np.array(bbox_list)
    
    RT_dict["image_id"] = int(index)
    RT_dict["gt_class_ids"] = gt_class_ids
    RT_dict["gt_handle_visibility"] = np.ones_like(gt_class_ids)
    RT_dict["gt_bboxes"] = gt_bboxes
    RT_dict["gt_RTs"] = RT_temp.reshape(j+1, 4, 4)

    save_path = "./gts/results_{:06}_{:04}.pkl".format(0, int(index))
    with open(save_path, 'wb') as f:
        cPickle.dump(RT_dict, f)

    save_path = "./meta/{:04}_meta.txt".format(int(index))
    with open(save_path, 'w') as f:
        for n in range(j):
            f.write(str(n)+" "+str(class_ids_list[n])+" "+"{:04}".format(class_ids_list[n]))
            f.write("\n")

cd makeNOCS/output_data/bop_data/lm/CAMERA/train/000000
python3  create_nocs_results.py

上記を実行すると、gts, meta フォルダにそれぞれ以下のファイルが作成されます。

gts … pklファイル形式。image_id, gt_class_ids, gt_handle_visibility, gt_bboxes, gt_RTs が記載されている。
meta … txtファイル形式。インスタンスID、カテゴリ番号、4桁カテゴリ番号の順

その他のフォルダを作成していきます。

mask_no_use フォルダを作成して、mask, mask_visib フォルダを移動します。

mkdir -p makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask_no_use
mv makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask_no_use makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask
mv makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask_no_use makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask_visib

custom_mask フォルダを mask フォルダに名称変更します。

mv makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/custom_mask makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask

depth フォルダをコピーして、mask_independent に名称変更します。

cp makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/depth makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/mask_independent

同様に depth フォルダをmaskeNOCS/output_datra/bop_data/lm にコピーします。
その後、camera_full_depth に名称変更します。

cp makeNOCS/output_data/bop_data/lm/CAMERA/train/000000/depth makeNOCS/output_data/bop_data/lm/
mv makeNOCS/output_data/bop_data/lm/depth makeNOCS/output_data/bop_data/lm/camera_full_depth

もう少しで準備が終わります。

今作成した CAMERA/train をコピーして CAMERA/val に名称変更します。

cp makeNOCS/output_data/bop_data/lm/CAMERA/train makeNOCS/output_data/bop_data/lm/CAMERA/val

次に CAMERA フォルダをコピーして Real に変更します。

cp makeNOCS/output_data/bop_data/lm/CAMERA makeNOCS/output_data/bop_data/lm/Real
mv makeNOCS/output_data/bop_data/lm/Real/val makeNOCS/output_data/bop_data/lm/Real/test

これで準備は終了です。

中間ファイルの作成

次に、pose_data_custom.py を作成していきます。

cd makeNOCS/object-deformnet/preprocess
touch pose_data_custom.py

pose_data_custom.py

import os
import sys
import glob
import cv2
import numpy as np
import _pickle as cPickle
from tqdm import tqdm
sys.path.append('../lib')
from lib.align import align_nocs_to_depth
from lib.utils import load_depth
import h5py

def create_img_list(data_dir):
    """ Create train/val/test data list for CAMERA and Real. """
    # CAMERA dataset
    for subset in ['train', 'val']:
        img_list = []
        img_dir = os.path.join(data_dir, 'CAMERA', subset)
        folder_list = [name for name in os.listdir(img_dir) if os.path.isdir(os.path.join(img_dir, name))]
        img_list_ = glob.glob(img_dir+"/000000/color/*.jpg")
        for i in range(len(img_list_)):
            folder_id = 0
            img_id = int(i)
            img_path = os.path.join(subset, '{:06d}'.format(folder_id))#, 'color/{:06d}.jpg'.format(img_id))
            img_list.append(img_path)
        with open(os.path.join(data_dir, 'CAMERA', subset+'_list_all.txt'), 'w') as f:
            for img_path in img_list:
                f.write("%s\n" % img_path)
    # Real dataset
    for subset in ['train', 'test']:
        img_list = []
        img_dir = os.path.join(data_dir, 'Real', subset)
        folder_list = [name for name in sorted(os.listdir(img_dir)) if os.path.isdir(os.path.join(img_dir, name))]
        for folder in folder_list:
            img_paths = glob.glob(os.path.join(img_dir, folder, 'color/*.jpg'))
            img_paths = sorted(img_paths)
            for img_full_path in img_paths:
                img_name = os.path.basename(img_full_path)
                img_ind = img_name.split('_')[0]
                img_path = os.path.join(subset, folder)#, 'color/{}'.format(img_ind))
                img_list.append(img_path)
        with open(os.path.join(data_dir, 'Real', subset+'_list_all.txt'), 'w') as f:
            for img_path in img_list:
                f.write("%s\n" % img_path)
    print('Write all data paths to file done!')


def process_data(img_path, depth, path_dict):
    """ Load instance masks for the objects in the image. """
    # mask_path = img_path + '_mask.png'
    mask_path = path_dict["mask"]
    mask = cv2.imread(mask_path)[:, :, 2]
    mask = np.array(mask, dtype=np.int32)
    all_inst_ids = sorted(list(np.unique(mask)))
    if all_inst_ids[-1] != 255:
        all_inst_ids.append(255)
    assert all_inst_ids[-1] == 255
    del all_inst_ids[-1]    # remove background
    num_all_inst = len(all_inst_ids)
    h, w = mask.shape

    # coord_path = img_path + '_coord.png'
    coord_path = path_dict["nocs"]
    with h5py.File(coord_path) as f:
      nocs_h5py = np.array(f["nocs"])

    # coord_map = cv2.imread(coord_path)[:, :, :3]
    # coord_map = coord_map[:, :, (2, 1, 0)]
    coord_map = nocs_h5py[:, :, :3]
    coord_map = coord_map[:, :, (2, 1, 0)]
    # flip z axis of coord map
    coord_map = np.array(coord_map, dtype=np.float32)#  / 255
    coord_map[:, :, 2] = 1 - coord_map[:, :, 2]

    class_ids = []
    instance_ids = []
    model_list = []
    masks = np.zeros([h, w, num_all_inst], dtype=np.uint8)
    coords = np.zeros((h, w, num_all_inst, 3), dtype=np.float32)
    bboxes = np.zeros((num_all_inst, 4), dtype=np.int32)

    meta_path = path_dict["meta"]
    # meta_path = img_path + '_meta.txt'
    with open(meta_path, 'r') as f:
        i = 0
        for line in f:
            line_info = line.strip().split(' ')
            inst_id = int(line_info[0])
            cls_id = int(line_info[1])
            # background objects and non-existing objects
            if cls_id == 0 or (inst_id not in all_inst_ids):
                continue
            if len(line_info) == 3:
                model_id = line_info[2]    # Real scanned objs
            else:
                model_id = line_info[3]    # CAMERA objs
            # remove one mug instance in CAMERA train due to improper model
            if model_id == 'b9be7cfe653740eb7633a2dd89cec754':
                continue
            # process foreground objects
            inst_mask = np.equal(mask, inst_id)
            # bounding box
            horizontal_indicies = np.where(np.any(inst_mask, axis=0))[0]
            vertical_indicies = np.where(np.any(inst_mask, axis=1))[0]
            assert horizontal_indicies.shape[0], print(img_path)
            x1, x2 = horizontal_indicies[[0, -1]]
            y1, y2 = vertical_indicies[[0, -1]]
            # x2 and y2 should not be part of the box. Increment by 1.
            x2 += 1
            y2 += 1
            # object occupies full image, rendering error, happens in CAMERA dataset
            if np.any(np.logical_or((x2-x1) > 700, (y2-y1) > 500)):
                print(x2-x1, y2-y1)
                return None, None, None, None, None, None
            # not enough valid depth observation
            final_mask = np.logical_and(inst_mask, depth > 0)
            if np.sum(final_mask) < 64:
                continue
            class_ids.append(cls_id)
            instance_ids.append(inst_id)
            model_list.append(model_id)
            masks[:, :, i] = inst_mask
            coords[:, :, i, :] = np.multiply(coord_map, np.expand_dims(inst_mask, axis=-1))
            bboxes[i] = np.array([y1, x1, y2, x2])
            i += 1
    # no valid foreground objects
    if i == 0:
        return None, None, None, None, None, None

    masks = masks[:, :, :i]
    coords = np.clip(coords[:, :, :i, :], 0, 1)
    bboxes = bboxes[:i, :]
    return masks, coords, class_ids, instance_ids, model_list, bboxes


def annotate_camera_train(data_dir):
    """ Generate gt labels for CAMERA train data. """
    camera_train = open(os.path.join(data_dir, 'CAMERA', 'train_list_all.txt')).read().splitlines()
    # intrinsics = np.array([[577.5, 0, 319.5], [0, 577.5, 239.5], [0, 0, 1]])
    intrinsics = np.array([[572.4, 0, 325.3], [0, 573.6, 242.0], [0, 0, 1.0]])
    # meta info for re-label mug category
    # with open(os.path.join(data_dir, 'obj_models/mug_meta.pkl'), 'rb') as f:
    #     mug_meta = cPickle.load(f)

    valid_img_list = []
    index = 0
    for img_path in tqdm(camera_train):
        path_dict = {}
        img_full_path = os.path.join(data_dir, 'CAMERA', img_path)
        depth_composed_path = "{:06}.png".format(index)
        path_dict["nocs"] = img_full_path + '/coord/{}.hdf5'.format(index)
        path_dict["meta"] = img_full_path + '/meta/{:04}_meta.txt'.format(index)
        path_dict["mask"] = img_full_path + '/mask_independent/{:06}.png'.format(index)
        path_dict["color"] = img_full_path + '/color/{:06}.jpg'.format(index)
        depth_full_path = os.path.join(data_dir,'camera_full_depths', depth_composed_path)
        all_exist = os.path.exists(img_full_path + '/color/{:06}.jpg'.format(index)) and \
                    os.path.exists(img_full_path + '/coord/{}.hdf5'.format(index)) and \
                    os.path.exists(img_full_path + '/depth/{:06}.png'.format(index)) and \
                    os.path.exists(img_full_path + '/mask/{:06}_mask.png'.format(index)) and \
                    os.path.exists(img_full_path + '/meta/{:04}_meta.txt'.format(index))
        index += 1
        # all_exist = os.path.exists(img_full_path + '_color.png') and \
        #             os.path.exists(img_full_path + '_coord.png') and \
        #             os.path.exists(img_full_path + '_depth.png') and \
        #             os.path.exists(img_full_path + '_mask.png') and \
        #             os.path.exists(img_full_path + '_meta.txt')
        if not all_exist:
            print("annotate_camera_train_path")
            continue
        # depth = load_depth(img_full_path)
        depth = load_depth(depth_full_path)
        masks, coords, class_ids, instance_ids, model_list, bboxes = process_data(img_full_path, depth, path_dict)
        if instance_ids is None:
            print("annotate_camera_train_path instance ids")
            continue
        # Umeyama alignment of GT NOCS map with depth image
        scales, rotations, translations, error_messages, _ = \
            align_nocs_to_depth(masks, coords, depth, intrinsics, instance_ids, img_path)
        if error_messages:
            print("annotate_camera_train_path error msg", error_messages)
            continue
        # re-label for mug category
        for i in range(len(class_ids)):
            pass
            # if class_ids[i] == 6:
            #     T0 = mug_meta[model_list[i]][0]
            #     s0 = mug_meta[model_list[i]][1]
            #     T = translations[i] - scales[i] * rotations[i] @ T0
            #     s = scales[i] / s0
            #     scales[i] = s
            #     translations[i] = T
        # write results
        gts = {}
        gts['class_ids'] = class_ids    # int list, 1 to 6
        gts['bboxes'] = bboxes  # np.array, [[y1, x1, y2, x2], ...]
        gts['scales'] = scales.astype(np.float32)  # np.array, scale factor from NOCS model to depth observation
        gts['rotations'] = rotations.astype(np.float32)    # np.array, R
        gts['translations'] = translations.astype(np.float32)  # np.array, T
        gts['instance_ids'] = instance_ids  # int list, start from 1
        gts['model_list'] = model_list  # str list, model id/name
        os.makedirs(img_full_path + "/pkl", exist_ok=True)
        with open(img_full_path + "/pkl/" + '{:06}_label.pkl'.format(index-1), 'wb') as f:
            cPickle.dump(gts, f)
        valid_img_list.append(img_path)
    # write valid img list to file
    with open(os.path.join(data_dir, 'CAMERA/train_list.txt'), 'w') as f:
        for img_path in valid_img_list:
            f.write("%s\n" % img_path)


def annotate_real_train(data_dir):
    """ Generate gt labels for Real train data through PnP. """
    real_train = open(os.path.join(data_dir, 'Real/train_list_all.txt')).read().splitlines()
    # intrinsics = np.array([[591.0125, 0, 322.525], [0, 590.16775, 244.11084], [0, 0, 1]])
    intrinsics = np.array([[572.4, 0, 325.3], [0, 573.6, 242.0], [0, 0, 1.0]])
    # scale factors for all instances
    scale_factors = {}
    # path_to_size = glob.glob(os.path.join(data_dir, 'obj_models/real_train', '*_norm.txt'))
    # path_to_size = glob.glob(os.path.join(data_dir, 'models_obj/real_train', '*.txt'))
    path_to_size = glob.glob(os.path.join(data_dir, 'models_obj', 'obj_000001_norm.txt'))
    for inst_path in sorted(path_to_size):
        instance = os.path.basename(inst_path).split('.')[0]
        bbox_dims = np.loadtxt(inst_path)
        scale_factors[instance] = np.linalg.norm(bbox_dims)
    # meta info for re-label mug category
    # with open(os.path.join(data_dir, 'obj_models/mug_meta.pkl'), 'rb') as f:
    #     mug_meta = cPickle.load(f)
    index = 0
    valid_img_list = []
    for img_path in tqdm(real_train):
        img_full_path = os.path.join(data_dir, 'Real', img_path)
        path_dict = {}
        depth_composed_path = "{:06}.png".format(index)
        path_dict["nocs"] = img_full_path + '/coord/{}.hdf5'.format(index)
        path_dict["meta"] = img_full_path + '/meta/{:04}_meta.txt'.format(index)
        path_dict["mask"] = img_full_path + '/mask_independent/{:06}.png'.format(index)
        path_dict["color"] = img_full_path + '/color/{:06}.jpg'.format(index)
        depth_full_path = os.path.join(data_dir,'camera_full_depths', depth_composed_path)
        all_exist = os.path.exists(img_full_path + '/color/{:06}.jpg'.format(index)) and \
                    os.path.exists(img_full_path + '/coord/{}.hdf5'.format(index)) and \
                    os.path.exists(img_full_path + '/depth/{:06}.png'.format(index)) and \
                    os.path.exists(img_full_path + '/mask/{:06}_mask.png'.format(index)) and \
                    os.path.exists(img_full_path + '/meta/{:04}_meta.txt'.format(index))
        index += 1
        # all_exist = os.path.exists(img_full_path + '_color.png') and \
        #             os.path.exists(img_full_path + '_coord.png') and \
        #             os.path.exists(img_full_path + '_depth.png') and \
        #             os.path.exists(img_full_path + '_mask.png') and \
        #             os.path.exists(img_full_path + '_meta.txt')
        if not all_exist:
            print("annotate_real_train pass")
            continue
        # depth = load_depth(img_full_path)
        depth = load_depth(depth_full_path)
        masks, coords, class_ids, instance_ids, model_list, bboxes = process_data(img_full_path, depth, path_dict)
        if instance_ids is None:
            continue
        # compute pose
        num_insts = len(class_ids)
        scales = np.zeros(num_insts)
        rotations = np.zeros((num_insts, 3, 3))
        translations = np.zeros((num_insts, 3))
        for i in range(num_insts):
            s = scale_factors["obj_00"+model_list[i]+"_norm"]
            mask = masks[:, :, i]
            idxs = np.where(mask)
            coord = coords[:, :, i, :]
            coord_pts = s * (coord[idxs[0], idxs[1], :] - 0.5)
            coord_pts = coord_pts[:, :, None]
            img_pts = np.array([idxs[1], idxs[0]]).transpose()
            img_pts = img_pts[:, :, None].astype(float)
            distCoeffs = np.zeros((4, 1))    # no distoration
            retval, rvec, tvec = cv2.solvePnP(coord_pts, img_pts, intrinsics, distCoeffs)
            assert retval
            R, _ = cv2.Rodrigues(rvec)
            T = np.squeeze(tvec)
            # re-label for mug category
            # if class_ids[i] == 6:
            #     T0 = mug_meta[model_list[i]][0]
            #     s0 = mug_meta[model_list[i]][1]
            #     T = T - s * R @ T0
            #     s = s / s0
            scales[i] = s
            rotations[i] = R
            translations[i] = T
        # write results
        gts = {}
        gts['class_ids'] = class_ids    # int list, 1 to 6
        gts['bboxes'] = bboxes  # np.array, [[y1, x1, y2, x2], ...]
        gts['scales'] = scales.astype(np.float32)  # np.array, scale factor from NOCS model to depth observation
        gts['rotations'] = rotations.astype(np.float32)    # np.array, R
        gts['translations'] = translations.astype(np.float32)  # np.array, T
        gts['instance_ids'] = instance_ids  # int list, start from 1
        gts['model_list'] = model_list  # str list, model id/name
        with open(img_full_path + '_label.pkl', 'wb') as f:
            cPickle.dump(gts, f)
        valid_img_list.append(img_path)
    # write valid img list to file
    with open(os.path.join(data_dir, 'Real/train_list.txt'), 'w') as f:
        for img_path in valid_img_list:
            f.write("%s\n" % img_path)


def annotate_test_data(data_dir):
    """ Generate gt labels for test data.
        Properly copy handle_visibility provided by NOCS gts.
    """
    # Statistics:
    # test_set    missing file     bad rendering    no (occluded) fg    occlusion (< 64 pts)
    #   val        3792 imgs        132 imgs         1856 (23) imgs      50 insts
    #   test       0 img            0 img            0 img               2 insts

    camera_val = open(os.path.join(data_dir, 'CAMERA', 'val_list_all.txt')).read().splitlines()
    real_test = open(os.path.join(data_dir, 'Real', 'test_list_all.txt')).read().splitlines()
    camera_intrinsics = np.array([[577.5, 0, 319.5], [0, 577.5, 239.5], [0, 0, 1]])
    real_intrinsics = np.array([[591.0125, 0, 322.525], [0, 590.16775, 244.11084], [0, 0, 1]])
    # compute model size
    model_file_path = ['models_obj/camera_val.pkl', 'models_obj/real_test.pkl']
    models = {}
    for path in model_file_path:
        with open(os.path.join(data_dir, path), 'rb') as f:
            models.update(cPickle.load(f))
    model_sizes = {}
    for key in models.keys():
        model_sizes[key] = 2 * np.amax(np.abs(models[key]), axis=0)
    # meta info for re-label mug category
    # with open(os.path.join(data_dir, 'obj_models/mug_meta.pkl'), 'rb') as f:
    #     mug_meta = cPickle.load(f)

    subset_meta = [('CAMERA', camera_val, camera_intrinsics, 'val'), ('Real', real_test, real_intrinsics, 'test')]
    index = 0
    for source, img_list, intrinsics, subset in subset_meta:
        valid_img_list = []
        for img_path in tqdm(img_list):
            img_full_path = os.path.join(data_dir, source, img_path)
            path_dict = {}
            depth_composed_path = "{:06}.png".format(index)
            path_dict["nocs"] = img_full_path + '/coord/{}.hdf5'.format(index)
            path_dict["meta"] = img_full_path + '/meta/{:04}_meta.txt'.format(index)
            path_dict["mask"] = img_full_path + '/mask_independent/{:06}.png'.format(index)
            path_dict["color"] = img_full_path + '/color/{:06}.jpg'.format(index)
            depth_full_path = os.path.join(data_dir,'camera_full_depths', depth_composed_path)
            all_exist = os.path.exists(img_full_path + '/color/{:06}.jpg'.format(index)) and \
                        os.path.exists(img_full_path + '/coord/{}.hdf5'.format(index)) and \
                        os.path.exists(img_full_path + '/depth/{:06}.png'.format(index)) and \
                        os.path.exists(img_full_path + '/mask/{:06}_mask.png'.format(index)) and \
                        os.path.exists(img_full_path + '/meta/{:04}_meta.txt'.format(index))
            index += 1
            # all_exist = os.path.exists(img_full_path + '_color.png') and \
            #             os.path.exists(img_full_path + '_coord.png') and \
            #             os.path.exists(img_full_path + '_depth.png') and \
            #             os.path.exists(img_full_path + '_mask.png') and \
            #             os.path.exists(img_full_path + '_meta.txt')
            if not all_exist:
                continue
            # depth = load_depth(img_full_path)
            depth = load_depth(depth_full_path)
            masks, coords, class_ids, instance_ids, model_list, bboxes = process_data(img_full_path, depth, path_dict)
            if instance_ids is None:
                continue
            num_insts = len(instance_ids)
            # match each instance with NOCS ground truth to properly assign gt_handle_visibility
            nocs_dir = os.path.join(os.path.dirname(data_dir), 'results/nocs_results')
            if source == 'CAMERA':
                nocs_path = os.path.join(nocs_dir, 'val', 'results_val_{}_{}.pkl'.format(
                    img_path.split('/')[-2], img_path.split('/')[-1]))
            else:
                nocs_path = os.path.join(nocs_dir, 'real_test', 'results_test_{}_{}.pkl'.format(
                    img_path.split('/')[-2], img_path.split('/')[-1]))
            with open(nocs_path, 'rb') as f:
                nocs = cPickle.load(f)
            gt_class_ids = nocs['gt_class_ids']
            gt_bboxes = nocs['gt_bboxes']
            gt_sRT = nocs['gt_RTs']
            gt_handle_visibility = nocs['gt_handle_visibility']
            map_to_nocs = []
            for i in range(num_insts):
                gt_match = -1
                for j in range(len(gt_class_ids)):
                    if gt_class_ids[j] != class_ids[i]:
                        continue
                    if np.sum(np.abs(bboxes[i] - gt_bboxes[j])) > 5:
                        continue
                    # match found
                    gt_match = j
                    break
                # check match validity
                assert gt_match > -1, print(img_path, instance_ids[i], 'no match for instance')
                assert gt_match not in map_to_nocs, print(img_path, instance_ids[i], 'duplicate match')
                map_to_nocs.append(gt_match)
            # copy from ground truth, re-label for mug category
            handle_visibility = gt_handle_visibility[map_to_nocs]
            sizes = np.zeros((num_insts, 3))
            poses = np.zeros((num_insts, 4, 4))
            scales = np.zeros(num_insts)
            rotations = np.zeros((num_insts, 3, 3))
            translations = np.zeros((num_insts, 3))
            for i in range(num_insts):
                gt_idx = map_to_nocs[i]
                sizes[i] = model_sizes[model_list[i]]
                sRT = gt_sRT[gt_idx]
                s = np.cbrt(np.linalg.det(sRT[:3, :3]))
                R = sRT[:3, :3] / s
                T = sRT[:3, 3]
                # re-label mug category
                if class_ids[i] == 6:
                    T0 = mug_meta[model_list[i]][0]
                    s0 = mug_meta[model_list[i]][1]
                    T = T - s * R @ T0
                    s = s / s0
                # used for test during training
                scales[i] = s
                rotations[i] = R
                translations[i] = T
                # used for evaluation
                sRT = np.identity(4, dtype=np.float32)
                sRT[:3, :3] = s * R
                sRT[:3, 3] = T
                poses[i] = sRT
            # write results
            gts = {}
            gts['class_ids'] = np.array(class_ids)    # int list, 1 to 6
            gts['bboxes'] = bboxes    # np.array, [[y1, x1, y2, x2], ...]
            gts['instance_ids'] = instance_ids    # int list, start from 1
            gts['model_list'] = model_list    # str list, model id/name
            gts['size'] = sizes   # 3D size of NOCS model
            gts['scales'] = scales.astype(np.float32)    # np.array, scale factor from NOCS model to depth observation
            gts['rotations'] = rotations.astype(np.float32)    # np.array, R
            gts['translations'] = translations.astype(np.float32)    # np.array, T
            gts['poses'] = poses.astype(np.float32)    # np.array
            gts['handle_visibility'] = handle_visibility    # handle visibility of mug
            with open(img_full_path + '_label.pkl', 'wb') as f:
                cPickle.dump(gts, f)
            valid_img_list.append(img_path)
        # write valid img list to file
        with open(os.path.join(data_dir, source, subset+'_list.txt'), 'w') as f:
            for img_path in valid_img_list:
                f.write("%s\n" % img_path)


if __name__ == '__main__':
    data_dir = '/path/to/makeNOCS/output_data/bop_data/lm/'
    # create list for all data
    create_img_list(data_dir)
    # annotate dataset and re-write valid data to list
    annotate_camera_train(data_dir)
    print("================== annotate camera train complete ===================")
    annotate_real_train(data_dir)
    print("================== annotate real train complete ===================")
    annotate_test_data(data_dir)
    print("================== annotate test data complete ===================")

上記を実行すると、

================== annotate real train complete ===================

この後にエラーが出ますが、上記が表示されていれば完了です。

makeNOCS/output_data/bop_data/lm/CAMERA/train/000000 に pkl フォルダが作成され、学習用の中間データが保存されています。

おわりに

今回はここまでとします。

ここまで出来れば、あとはオートエンコーダーを学習して、NOCSデータを CenterSnap 用に変更して、CenterSnap を学習させるだけです。

その前に、今回の二つのプログラムの説明を挟みます。