Pixel-to-Pose Estimator for Robot Control

Robotic manipulation in unstructured environments relies heavily on visual information for context. Images from color (RGB) cameras provide rich environmental data at low cost. Recently, deep learning has proven to be a powerful tool for extracting features from images and using these features to estimate the class and location of objects in the image. However, data collection with physical robotic arms is slow and can be dangerous. The application studied in this work is a robotic arm used in a molten-salt nuclear facility to replace failed pipe flanges. In this scenario, access to physical hardware is only available when the plant is not operating so it is not practical to collect real-life data for training. This work trains a deep neural network, using only simulated images, to estimate the pose of targets in the robot’s coordinate system.  The estimated position is accurate to 52.7 mm in a 0.95 cubic meter workspace. We describe the simulation, the deep pixel-to-pose estimator model and training, environment and domain randomizations used for training, and demonstrate the performance on a precise robotic tool alignment task. Read the complete pre-print paper here.