Latest in Useless AI Skills: Pen Spinning ✍️
So essentially,
AI can pen spin (kinda)
Paper:
Lessons from Learning to Spin “Pens” (13 Pages)
Github:
https://penspin.github.io/
Researchers from UC San Diego, CMU and UC Berkeley are interested in developing dexterous AI models.
Hmm..What’s the background?
Dexterous in-hand manipulation is a fundamental skill for robots. This involves complex movements and coordination, particularly when manipulating objects with complex shapes like pens.
Despite decades of research, in-hand manipulation, particularly with pen-like objects, remains a significant challenge in robotics. Existing research primarily focuses on manipulating simple shapes like spheres or cubes, highlighting the need for advancements in handling more complex objects like pens.
Ok, So what is proposed in the research paper?
This approach combines several key ideas:
Reinforcement Learning with Privileged Information in Simulation: The researchers first train an "oracle policy" in a simulated environment. This policy uses reinforcement learning to learn how to spin a pen, but it has access to information not available in the real world, such as the exact position and physical properties of the pen. This is referred to as "privileged information."
Generation of Realistic Trajectories: The oracle policy is designed to generate trajectories (sequences of movements) that are realistic enough to be executed on a real robot. This is achieved through careful design of the reward function, action space, and initial state distributions during simulation
Open-Loop Trajectory Replay for Real-World Demonstrations: The realistic trajectories learned by the oracle policy are then used as an "open-loop controller" on a real robot. This means that the robot simply replays the recorded actions without any feedback from its sensors. Successful trajectories from these replays are collected as demonstrations.
Pre-training and Fine-tuning a Sensorimotor Policy: A second "sensorimotor" policy is trained, which only relies on information from the robot's sensors (proprioception in this case). This policy is first pre-trained in simulation using the data generated by the oracle policy. Then, it is fine-tuned using the real-world demonstrations collected from the open-loop replay.
Directly training a policy in the real world is challenging due to the complexity of the task, while directly transferring a policy trained in simulation suffers from the significant gap between simulation and reality.
By using the oracle policy to generate realistic trajectories, the researchers can collect high-quality demonstrations that would be difficult to obtain otherwise. These demonstrations, combined with simulation pre-training, allow the sensorimotor policy to learn the task effectively and adapt to real-world dynamics.
Source: Github
What’s next?
The researchers identify several promising directions for future research related to robot pen spinning and, more broadly, dexterous in-hand manipulation:
Incorporating Touch and Vision: While the current system relies solely on proprioception (the robot's sense of its own joint positions), the researchers acknowledge that touch and vision likely play a role in human pen spinning. Future work could explore whether incorporating these sensory modalities could further improve performance
Generalizing to Multi-Axis Rotation: The current system is limited to spinning pens along a single axis (the z-axis). Future work could investigate extending the approach to achieve more general multi-axis rotation, potentially enabling the robot to perform more complex and impressive pen spinning tricks
Improving Vision-Based Sim-to-Real Transfer: The researchers highlight the significant challenges of applying vision-based policies trained in simulation to the real world, particularly for dynamic, contact-rich tasks like pen spinning.
Exploring Alternative Reinforcement Learning Methods: The researchers used Proximal Policy Optimization (PPO) to train their oracle policy. Investigating other reinforcement learning algorithms might yield performance improvements or allow for more efficient training.
So essentially,
Robots can pen spin (kinda)