Constructing a unified canonical pose representation for 3D object categories is crucial for pose estimation and robotic scene understanding. Previous unified pose representations often relied on manual alignment, such as in ShapeNet and ModelNet. Recently, self-supervised canonicalization methods have been proposed, However, they are sensitive to intra-class shape variations, and their canonical pose representations cannot be aligned to a coordinate system centered on the object.
In this paper, we propose a category-level canonicalization method that alleviates the impact of shape variation and extends the canonical pose representation to an upright and forward-facing state. First, we design a Siamese VN Module (SVNM) that achieves SE(3) equivariance modeling and self-supervised disentangling of 3D shape and pose attributes. Next, we introduce a Siamese equivariant constraint that addresses the pose alignment bias caused by shape deformation. Finally, we propose a method to generate upright surface labels from pose-unknown in-the-wild data and use upright and symmetry losses to correct the canonical pose.
Experimental results show that our method not only achieves SOTA consistency performance but also aligns with the object-centered coordinate system.