物体の三次元姿勢推定 CenterSnap -bop_toolkitでシーン情報作成- 【Python】

はじめに
前提条件
シーンデータの再確認
bop_toolkit_lib のインストール
bop_toolkit のプログラム変更
dataset_params.py 全文
おわりに

はじめに

前回はワークをランダムに配置するシーンを作成しました。

今回からはランダムに配置されたシーンから、学習に使用する情報を抽出していきます。まずは bop_toolkit_lib について説明します。

前提条件

前提条件は以下の通りです。

Windows11 (三次元モデルの準備にのみ使用)
Ubuntu22 (モデル準備以降に使用)
Python3.10.x
CloudCompare
open3d == 0.16.0
こちらの記事を参考に三次元モデルを作成していること
シーンの作成が完了していること

シーンデータの再確認

前回作成したデータを確認していきます。

作成されたフォルダは

output_data
output_NOCS

の2フォルダになります。

output_data/bop_data/lm/models を output_data/bop_data/lm/models_obj に変更してください。

さらに、output_data/bop_data/lm/models_obj に obj_000001.obj, obj_000001.mtl, obj_000001.ply をコピーしておいてください。

bop_toolkit_lib のインストール

bop_toolkit とは、BlenderProc から出力されたシーンから、各物体のマスクや深度情報を取得するためのツールとなります。

githubはこちらにあります。

cd makeNOCS
git clone https://github.com/thodan/bop_toolkit.git
cd bop_toolkit
python3 -m pip install -r requirements.txt -e .
※scipy, glumpy でエラーが出たらコメントアウト

上記コマンドで bop_toolkit_lib がインストールできました。

今回使用する bop_toolkit のスクリプトは以下の4つです。

scripts/calc_gt_masks.py … マスク画像を作成
scripts/calc_gt_info.py … シーンのカメラ係数等の情報を出力
scripts/calc_gt_coco.py … coco に基づいたシーン情報を出力
scripts/calc_model_info.py … シーン内の各モデルの情報を出力

bop_toolkit のプログラム変更

続いて、上記プログラムを動かすために、修正を加えていきます。

bop_toolkit/bop_toolkit_lib/dataset_params.py

75 – 90 行目

  # Object ID's.
  obj_ids = {
    #'lm': list(range(1, 16)),
    'lm': [1],
    'lmo': [1, 5, 6, 8, 9, 10, 11, 12],
    'tless': list(range(1, 31)),
    'tudl': list(range(1, 4)),
    'tyol': list(range(1, 22)),
    'ruapc': list(range(1, 15)),
    'icmi': list(range(1, 7)),
    'icbin': list(range(1, 3)),
    'itodd': list(range(1, 29)),
    'hbs': [1, 3, 4, 8, 9, 10, 12, 15, 17, 18, 19, 22, 23, 29, 32, 33],
    'hb': list(range(1, 34)),  # Full HB dataset.
    'ycbv': list(range(1, 22)),
    'hope': list(range(1, 29)),
  }[dataset_name]

‘lm’: list(range(1, 16)) から、’lm’: [1] に変更しました。
本来の LineMod データセットは 16種類の物体があるのですが、カスタムデータでは 1種類しか使用しません。
2種類使用する場合は、’lm’: list(range(1, 2)) としてください。

427 – 433 行目

'custom_mask_path': join(
    split_path, '{scene_id:06d}' , 'custom_mask', '{im_id:06d}_mask.png'
),
'custom_mask_folder': join(
    split_path, '{scene_id:06d}' , 'custom_mask'
),

custom_mask_path, custom_mask_folder を追加しました。
上記情報をこれから使用していきます。

以上が変更箇所となります。

dataset_params.py 全文

抜けがあると厄介なので、全文を載せておきます。

# Author: Tomas Hodan (hodantom@cmp.felk.cvut.cz)
# Center for Machine Perception, Czech Technical University in Prague

"""Parameters of the BOP datasets."""

import math
import glob
import os
from os.path import join

from bop_toolkit_lib import inout


def get_camera_params(datasets_path, dataset_name, cam_type=None):
  """Returns camera parameters for the specified dataset.

  Note that parameters returned by this functions are meant only for simulation
  of the used sensor when rendering training images. To get per-image camera
  parameters (which may vary), use path template 'scene_camera_tpath' contained
  in the dictionary returned by function get_split_params.

  :param datasets_path: Path to a folder with datasets.
  :param dataset_name: Name of the dataset for which to return the parameters.
  :param cam_type: Type of camera.
  :return: Dictionary with camera parameters for the specified dataset.
  """
  if dataset_name == 'tless':
    # Includes images captured by three sensors. Use Primesense as default.
    if cam_type is None:
      cam_type = 'primesense'
    cam_filename = 'camera_{}.json'.format(cam_type)

  elif dataset_name in ['hbs', 'hb']:
    # Both versions of the HB dataset share the same directory.
    dataset_name = 'hb'

    # Includes images captured by two sensors. Use Primesense as default.
    if cam_type is None:
      cam_type = 'primesense'
    cam_filename = 'camera_{}.json'.format(cam_type)

  elif dataset_name == 'ycbv':
    # Includes images captured by two sensors. Use the "UW" sensor as default.
    if cam_type is None:
      cam_type = 'uw'
    cam_filename = 'camera_{}.json'.format(cam_type)

  else:
    cam_filename = 'camera.json'

  # Path to the camera file.
  cam_params_path = join(datasets_path, dataset_name, cam_filename)

  p = {
    # Path to a file with camera parameters.
    'cam_params_path': cam_params_path,
  }

  # Add a dictionary containing the intrinsic camera matrix ('K'), image size
  # ('im_size'), and scale of the depth images ('depth_scale', optional).
  p.update(inout.load_cam_params(cam_params_path))

  return p


def get_model_params(datasets_path, dataset_name, model_type=None):
  """Returns parameters of object models for the specified dataset.

  :param datasets_path: Path to a folder with datasets.
  :param dataset_name: Name of the dataset for which to return the parameters.
  :param model_type: Type of object models.
  :return: Dictionary with object model parameters for the specified dataset.
  """
  # Object ID's.
  obj_ids = {
    #'lm': list(range(1, 16)),
    'lm': [1],
    'lmo': [1, 5, 6, 8, 9, 10, 11, 12],
    'tless': list(range(1, 31)),
    'tudl': list(range(1, 4)),
    'tyol': list(range(1, 22)),
    'ruapc': list(range(1, 15)),
    'icmi': list(range(1, 7)),
    'icbin': list(range(1, 3)),
    'itodd': list(range(1, 29)),
    'hbs': [1, 3, 4, 8, 9, 10, 12, 15, 17, 18, 19, 22, 23, 29, 32, 33],
    'hb': list(range(1, 34)),  # Full HB dataset.
    'ycbv': list(range(1, 22)),
    'hope': list(range(1, 29)),
  }[dataset_name]

  # ID's of objects with ambiguous views evaluated using the ADI pose error
  # function (the others are evaluated using ADD). See Hodan et al. (ECCVW'16).
  symmetric_obj_ids = {
    'lm': [3, 7, 10, 11],
    'lmo': [10, 11],
    'tless': list(range(1, 31)),
    'tudl': [],
    'tyol': [3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21],
    'ruapc': [8, 9, 12, 13],
    'icmi': [1, 2, 6],
    'icbin': [1],
    'itodd': [2, 3, 4, 5, 7, 8, 9, 11, 12, 14, 17, 18, 19, 23, 24, 25, 27, 28],
    'hbs': [10, 12, 18, 29],
    'hb': [6, 10, 11, 12, 13, 14, 18, 24, 29],
    'ycbv': [1, 13, 14, 16, 18, 19, 20, 21],
    'hope': None,  # Not defined yet.
  }[dataset_name]

  # T-LESS includes two types of object models, CAD and reconstructed.
  # Use the CAD models as default.
  if dataset_name == 'tless' and model_type is None:
    model_type = 'cad'

  # Both versions of the HB dataset share the same directory.
  if dataset_name == 'hbs':
    dataset_name = 'hb'

  # Name of the folder with object models.
  models_folder_name = 'models'
  if model_type is not None:
    models_folder_name += '_' + model_type

  # Path to the folder with object models.
  models_path = join(datasets_path, dataset_name, models_folder_name)

  p = {
    # ID's of all objects included in the dataset.
    'obj_ids': obj_ids,

    # ID's of objects with symmetries.
    'symmetric_obj_ids': symmetric_obj_ids,

    # Path template to an object model file.
    'model_tpath': join(models_path, 'obj_{obj_id:06d}.ply'),

    # Path to a file with meta information about the object models.
    'models_info_path': join(models_path, 'models_info.json')
  }

  return p


def get_split_params(datasets_path, dataset_name, split, split_type=None):
  """Returns parameters (camera params, paths etc.) for the specified dataset.

  :param datasets_path: Path to a folder with datasets.
  :param dataset_name: Name of the dataset for which to return the parameters.
  :param split: Name of the dataset split ('train', 'val', 'test').
  :param split_type: Name of the split type (e.g. for T-LESS, possible types of
    the 'train' split are: 'primesense', 'render_reconst').
  :return: Dictionary with parameters for the specified dataset split.
  """
  p = {
    'name': dataset_name,
    'split': split,
    'split_type': split_type,
    'base_path': join(datasets_path, dataset_name),

    'depth_range': None,
    'azimuth_range': None,
    'elev_range': None,
  }

  rgb_ext = '.png'
  gray_ext = '.png'
  depth_ext = '.png'

  if split_type == 'pbr':
    # The photorealistic synthetic images are provided in the JPG format.
    rgb_ext = '.jpg'
  elif dataset_name == 'itodd':
    gray_ext = '.tif'
    depth_ext = '.tif'

  p['im_modalities'] = ['rgb', 'depth']

  # Linemod (LM).
  if dataset_name == 'lm':
    p['scene_ids'] = list(range(1, 16))
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (600.90, 1102.35)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (0, 0.5 * math.pi)

  # Linemod-Occluded (LM-O).
  elif dataset_name == 'lmo':
    p['scene_ids'] = {'train': [1, 5, 6, 8, 9, 10, 11, 12], 'test': [2]}[split]
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (346.31, 1499.84)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (0, 0.5 * math.pi)

  # T-LESS.
  elif dataset_name == 'tless':
    if split == 'train':
      if split_type == 'synthetless':
        p['scene_ids'] = [1]
      else:
        p['scene_ids'] = list(range(1, 31))
    elif split == 'test':
      p['scene_ids'] = list(range(1, 21))

    # Use images from the Primesense sensor by default.
    if split_type is None:
      split_type = 'primesense'

    p['im_size'] = {
      'train': {
        'primesense': (400, 400),
        'kinect': (400, 400),
        'canon': (1900, 1900),
        'render_reconst': (1280, 1024),
        'pbr': (720, 540),
        'synthetless': (400, 400),
      },
      'test': {
        'primesense': (720, 540),
        'kinect': (720, 540),
        'canon': (2560, 1920)
      }
    }[split][split_type]

    # The following holds for Primesense, but is similar for the other sensors.
    if split == 'test':
      p['depth_range'] = (649.89, 940.04)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.5 * math.pi, 0.5 * math.pi)

  # TU Dresden Light (TUD-L).
  elif dataset_name == 'tudl':
    if split == 'train' and split_type is None:
      split_type = 'render'

    p['scene_ids'] = list(range(1, 4))
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (851.29, 2016.14)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.4363, 0.5 * math.pi)  # (-25, 90) [deg].

  # Toyota Light (TYO-L).
  elif dataset_name == 'tyol':
    p['scene_ids'] = list(range(1, 22))
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (499.57, 1246.07)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.5 * math.pi, 0.5 * math.pi)

  # Rutgers APC (RU-APC).
  elif dataset_name == 'ruapc':
    p['scene_ids'] = list(range(1, 15))
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (594.41, 739.12)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.5 * math.pi, 0.5 * math.pi)

  # Tejani et al. (IC-MI).
  elif dataset_name == 'icmi':
    p['scene_ids'] = list(range(1, 7))
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (509.12, 1120.41)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (0, 0.5 * math.pi)

  # Doumanoglou et al. (IC-BIN).
  elif dataset_name == 'icbin':
    p['scene_ids'] = {
      'train': list(range(1, 3)),
      'test': list(range(1, 4))
    }[split]
    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (454.56, 1076.29)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-1.0297, 0.5 * math.pi)  # (-59, 90) [deg].

  # MVTec ITODD.
  elif dataset_name == 'itodd':
    p['scene_ids'] = {
      'train': [],
      'val': [1],
      'test': [1]
    }[split]
    p['im_size'] = (1280, 960)

    p['im_modalities'] = ['gray', 'depth']

    if split == 'test':
      p['depth_range'] = (638.38, 775.97)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.5 * math.pi, 0.5 * math.pi)

  # HomebrewedDB (HB).
  # 'hbs' -- Subset of the HB dataset used in the BOP Challenge 2019/2020.
  # 'hb' -- Full HB dataset.
  elif dataset_name in ['hbs', 'hb']:
    dataset_name_orig = dataset_name
    dataset_name = 'hb'

    # Use images from the Primesense sensor by default.
    if split_type is None:
      split_type = 'primesense'

    if dataset_name_orig == 'hbs':
      p['scene_ids'] = {
        'train': [],
        'val': [3, 5, 13],
        'test': [3, 5, 13]
      }[split]
    else:
      p['scene_ids'] = {
        'train': [],
        'val': list(range(1, 14)),
        'test': list(range(1, 14))
      }[split]

    p['im_size'] = {
      'pbr': (640, 480),
      'primesense': (640, 480),
      'kinect': (1920, 1080)
    }[split_type]

    # The following holds for Primesense, but is similar for Kinect.
    if split == 'test':
      p['depth_range'] = (438.24, 1416.97)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-0.5 * math.pi, 0.5 * math.pi)

  # YCB-Video (YCBV).
  elif dataset_name == 'ycbv':
    if split == 'train' and split_type is None:
      split_type = 'real'

    if split == 'train':
      p['scene_ids'] = {
        'real': list(range(48)) + list(range(60, 92)),
        'pbr': None,  # Use function get_present_scene_ids().
        'synt': list(range(80))
      }[split_type]
    elif split == 'test':
      p['scene_ids'] = list(range(48, 60))

    p['im_size'] = (640, 480)

    if split == 'test':
      p['depth_range'] = (612.92, 1243.59)
      p['azimuth_range'] = (0, 2 * math.pi)
      p['elev_range'] = (-1.2788, 1.1291)  # (-73.27, 64.69) [deg].

  # HOPE.
  elif dataset_name == 'hope':
    p['scene_ids'] = {
      'train': [],
      'val': list(range(1, 11)),
      'test': list(range(1, 41))
    }[split]
    p['im_size'] = (1920, 1080)

    if split == 'test':
      p['depth_range'] = None  # Not calculated yet.
      p['azimuth_range'] = None  # Not calculated yet.
      p['elev_range'] = None  # Not calculated yet.

  else:
    raise ValueError('Unknown BOP dataset ({}).'.format(dataset_name))

  base_path = join(datasets_path, dataset_name)
  split_path = join(base_path, split)
  if split_type is not None:
    if split_type == 'pbr':
      p['scene_ids'] = list(range(50))
    split_path += '_' + split_type

  p.update({
    # Path to the split directory.
    'split_path': split_path,

    # Path template to a file with per-image camera parameters.
    'scene_camera_tpath': join(
      split_path, '{scene_id:06d}', 'scene_camera.json'),

    # Path template to a file with GT annotations.
    'scene_gt_tpath': join(
      split_path, '{scene_id:06d}', 'scene_gt.json'),

    # Path template to a file with meta information about the GT annotations.
    'scene_gt_info_tpath': join(
      split_path, '{scene_id:06d}', 'scene_gt_info.json'),
    
    # Path template to a file with the coco GT annotations.
    'scene_gt_coco_tpath': join(
      split_path, '{scene_id:06d}', 'scene_gt_coco.json'),

    # Path template to a gray image.
    'gray_tpath': join(
      split_path, '{scene_id:06d}', 'gray', '{im_id:06d}' + gray_ext),

    # Path template to an RGB image.
    'rgb_tpath': join(
      split_path, '{scene_id:06d}', 'rgb', '{im_id:06d}' + rgb_ext),

    # Path template to a depth image.
    'depth_tpath': join(
      split_path, '{scene_id:06d}', 'depth', '{im_id:06d}' + depth_ext),

    # Path template to a mask of the full object silhouette.
    'mask_tpath': join(
      split_path, '{scene_id:06d}', 'mask', '{im_id:06d}_{gt_id:06d}.png'),

    # Path template to a mask of the visible part of an object silhouette.
    'mask_visib_tpath': join(
      split_path, '{scene_id:06d}', 'mask_visib',
      '{im_id:06d}_{gt_id:06d}.png'),
    
    'custom_mask_path': join(
      split_path, '{scene_id:06d}' , 'custom_mask', '{im_id:06d}_mask.png'
    ),
    'custom_mask_folder': join(
      split_path, '{scene_id:06d}' , 'custom_mask'
    ),
  })

  return p


def get_present_scene_ids(dp_split):
  """Returns ID's of scenes present in the specified dataset split.

  :param dp_split: Path to a folder with datasets.
  :return: List with scene ID's.
  """
  scene_dirs = [d for d in glob.glob(os.path.join(dp_split['split_path'], '*'))
                if os.path.isdir(d)]
  scene_ids = [int(os.path.basename(scene_dir)) for scene_dir in scene_dirs]
  scene_ids = sorted(scene_ids)
  return scene_ids