The Fondazione Bruno Kessler (FBK) team participated in the BOP Challenge 2023 for 6D Object Pose Estimation in September. The contest aims to measure the progress in object pose estimation. Since 2017, the organisers from remarkable entities such as Google, Reality Labs at Meta, ENPC ParisTech, MVTec, Niantic and the universities of Czech, Heidelberg and Tsinghua have been organising challenges on the benchmark datasets in conjunction with the R6D workshops.
In 2023, methods were competing on the following six tasks. Task 1 (Model-based 6D localisation of seen objects) is the same since 2019, Tasks 2 (Model-based 2D detection of seen objects) and 3 (Model-based 2D segmentation of seen objects) are the same as in 2022. Tasks 4 (Model-based 6D localisation of unseen things), 5 (Model-based 2D detection of unseen objects) and 6 (Model-based 2D segmentation of unseen objects) were introduced in 2023. As in the previous years, only annotated object instances for which at least 10% of the projected surface area is visible were considered in the evaluation.
In the frame of this contest, researchers Andrea Caraffa, Davide Boscaini, and Fabio Poiesi have submitted a method named PoZe to task 4 and won the best method for one of the datasets (TUD-L). PoZe performs pose estimation of unseen objects through zero-shot learning. PoZe takes as input a coloured 3D point cloud that represents the object and an RGB-D image capturing the scene.
PoZe consists of five modules:
- 2D object segmentation: We segment the region that the object occupies in the RGB image. We use the segmentation masks predicted by CNOS FastSAM [A]. When no mask is available we use CNOS SAM segmentations. If no mask is still available we use the entire image. For each object we consider multiple masks because, in some cases, the ones with lower confidence scores are more accurate.
- 3D lifting: We crop the input image around the segmentation mask and back-project the cropped scene in the 3D space by using the camera intrinsic parameters.
- Feature extraction: We extract point-wise features from the point clouds of the object and the cropped scene. We use a frozen GeDi [C] model trained on 3DMatch for the point cloud registration task. We extract features at three different scales.
- Pose estimation: We estimate the 6D pose of the object with respect to the cropped scene using feature-matching RANSAC on the features extracted at each scale. Thereafter, we select the best estimated pose according to the number of RANSAC inliers.
- Pose refinement: We refine the pose estimated by the previous module by using the ICP algorithm.
[A] Zeng et al.: 3DMatch: Learning the matching of local 3D geometry in range scans, CVPR 2017
[B] Nguyen et al.: CNOS: A Strong Baseline for CAD-based Novel Object Segmentation, arXiv 2023
[C] Poiesi et al.: Learning general and distinctive 3D local deep descriptors for point cloud registration, IEEE PAMI 2023.
The full presentation of PoZe will be available in a scientific publication coming up soon.
Stay tuned to learn more about this approach!
Do you want to learn more about AI-PRISM research and developments? Subscribe to our newsletter and follow us on LinkedIn and Twitter, so you don’t miss a thing!