Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning

Tianhao Wu1,2,3, Jinzhou Li1,2*, Jiyao Zhang1,2,3*, Mingdong Wu1,2,3, Hao Dong1,2,3,✉
1Center on Frontiers of Computing Studies, School of Computer Science, Peking University 2PKU-Agibot Lab, School of Computer Science, Peking University 3National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University *Equal Contribution Corresponding Author


Our policy leverages both spatial and force information of 3D tactile to accomplish dexterous, fine-grained and contact-rich tasks.



Summary

Tactile sensing plays a vital role in enabling robots to perform fine-grained, contact-rich tasks. However, the high dimensionality of tactile data, due to the large coverage on dexterous hands, poses significant challenges for effective tactile feature learning, especially for 3D tactile data, as there are no large standardized datasets and no strong pretrained backbones. To address these challenges, we propose a novel canonical representation that reduces the difficulty of 3D tactile feature learning and further introduces a force-based selfsupervised pretraining task to capture both local and net force features, which are crucial for dexterous manipulation. Our method achieves an average success rate of 80% across three fine-grained, contact-rich dexterous manipulation tasks in realworld experiments, demonstrating effectiveness and robustness compared to other methods. Further analysis shows that our method fully utilizes both spatial and force information from 3D tactile data to accomplish the tasks.

Comparsion of Different Manipulation Policies

Open Box

Ours

T-DEX

DP

HATO

GNN

Reorientation

Ours

T-DEX

DP

HATO

GNN

Flip

Ours

T-DEX

DP

HATO

GNN

Assembly

Ours

T-DEX

DP

HATO

GNN

Ablation

Open Box

Ours

w/o NF PRE

w/o LF PRE

w/o PRE

w/o CR & PRE

Reorientation

Ours

w/o NF PRE

w/o LF PRE

w/o PRE

w/o CR & PRE

Flip

Ours

w/o NF PRE

w/o LF PRE

w/o PRE

w/o CR & PRE

Assembly

Ours

w/o NF PRE

w/o LF PRE

w/o PRE

w/o CR & PRE

Role of Spatial and Force Information

Without Spatial Ablation: We observed that after the robot reaching the object and attempting to grasp it, the thumb oscillated randomly, preventing further manipulation, which indicates our policy leverages spatial information for forming gross hand poses.

Without Force Ablation: Although the robot managed to reach and grasp the object, it consistently failed due to an unstable grasping or continuously adjusting grasp, which indicates our policy leverages force information for more fine-grained adjustments.

Generalization on Unseen Objects

Our policy can generalize to most of the unseen objects that have varying color, geometry and dynamics.

Open Box

Reorientation

Flip

Rollouts of Our Policy

Open Box

Reorientation

Flip

Assembly