Beijing – The wave of AIGC (AI-Generated Content) generative technology, which started with Sora in 2023 and continues with Kling, Vidu, and Tongyi Wanxiang, is sweeping the globe, opening the door to the practical application of AI. Now, this transformative technology is making significant strides in embodied intelligence and robotics. Tsinghua University’s ISRLab and Xingdong Ji Yuan have jointly open-sourced VPP (Video Prediction Policy), an AIGC generative robot large model that has been selected as an ICML2025 Spotlight. This innovation promises to accelerate the commercialization of humanoid robots by leveraging pre-trained video generation models to learn human actions from vast amounts of internet video data, reducing the reliance on high-quality real-world robot data and enabling seamless switching between different humanoid robot platforms.
Imagine a scenario where uttering the phrase Bring me a bowl of hot chicken soup doesn’t just conjure up a heartwarming video, but prompts a nearby robot to actually serve you a bowl of soup. This is the potential unlocked by VPP, which brings the magic of AIGC from the digital realm into the physical world of embodied intelligence, earning it the moniker Robot World’s Sora.
Key Features and Innovations of VPP:
- Leveraging Pre-trained Video Generation Models: VPP utilizes pre-trained video generation models trained on extensive internet video data. This allows the robot to directly learn human actions, significantly reducing the need for expensive and time-consuming real-world robot training data.
- Cross-Platform Compatibility: The model can seamlessly switch between different humanoid robot platforms, making it highly adaptable and versatile.
- ICML2025 Spotlight Recognition: The ICML (International Conference on Machine Learning) is a prestigious academic conference in the field of machine learning. Being selected as a Spotlight paper at ICML2025 is a testament to the significance and innovation of VPP. This year, the acceptance rate for Spotlight papers was less than 2.6% out of over 12,000 submissions.
- Addressing Diffusion Inference Speed: VPP cleverly addresses the issue of diffusion inference speed by transferring the generalization ability of video diffusion models to general robot operation strategies. This allows robots to perform real-time future prediction and action execution.
- Enhanced Robot Strategy Generalization: VPP significantly improves the generalization ability of robot strategies.
- Open Source Availability: The entire project, including the paper, project website, and source code, is open source, fostering collaboration and further development within the robotics community.
The Technical Details:
The research paper, titled Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations, details the technical aspects of VPP. The paper is available at https://arxiv.org/pdf/2412.14803.
The project website, which provides more information and resources, can be found at https://video-prediction-policy.github.io.
The open-source code is available on GitHub at https://github.com/roboterax/video-predic.
Implications for the Future of Robotics:
The development and open-sourcing of VPP represent a significant step forward in the field of robotics. By reducing the reliance on real-world training data and enabling cross-platform compatibility, VPP has the potential to dramatically accelerate the development and deployment of humanoid robots in a wide range of applications, from healthcare and manufacturing to elder care and domestic assistance. The open-source nature of the project encourages further innovation and collaboration, paving the way for a future where robots are more intelligent, adaptable, and capable of seamlessly interacting with the human world.
Conclusion:
The AIGC generative robot large model VPP, developed by Tsinghua University’s ISRLab and Xingdong Ji Yuan, marks a pivotal moment in robotics. Its innovative approach to learning from internet video data, coupled with its open-source availability, promises to revolutionize the field and accelerate the commercialization of humanoid robots. As research and development continue, we can expect to see even more impressive advancements in the capabilities of robots, bringing us closer to a future where they play an increasingly important role in our lives.
References:
- Tsinghua University ISRLab. (2024). Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations. Retrieved from https://arxiv.org/pdf/2412.14803
- Video Prediction Policy GitHub Repository. (n.d.). Retrieved from https://github.com/roboterax/video-predic
- Video Prediction Policy Project Website. (n.d.). Retrieved from https://video-prediction-policy.github.io
- Xingdong Ji Yuan Technology Co., Ltd. (2024). Robot World’s Sora Arrives: Tsinghua and Xingdong Ji Yuan Open Source First AIGC Robot Large Model, Selected for ICML2025 Spotlight!
Views: 1