上海宝山炮台湿地公园的蓝天白云上海宝山炮台湿地公园的蓝天白云

SHANGHAI – ZHIYUAN Robotics has officially launched Genie Operator-1 (GO-1), its first general-purpose embodied foundation model. This innovative model is built upon the novel Vision-Language-Latent-Action (ViLLA) architecture, integrating a multimodal large model (VLM) and a Mixture of Experts (MoE) system. By predicting latent action tokens, GO-1 effectively bridges the gap between image-text inputs and robotic action execution, marking a significant leap forward in the field of embodied AI.

The announcement highlights ZHIYUAN’s commitment to pushing the boundaries of robotics and artificial intelligence. The ViLLA architecture, the core of GO-1, represents a significant evolution from previous Vision-Language-Action (VLA) models.

ViLLA: Bridging the Gap Between Perception and Action

The ViLLA architecture is comprised of two key components: a VLM and a MoE. The VLM leverages vast amounts of internet image and text data to achieve general scene perception and language understanding. The MoE, on the other hand, consists of a Latent Planner and an Action Expert.

  • Latent Planner: This component utilizes extensive cross-embodiment and human operation video data to gain a universal understanding of actions.
  • Action Expert: Trained on millions of real-world robotic data points, the Action Expert possesses the ability to execute precise and nuanced actions.

This interconnected system allows GO-1 to learn from human video demonstrations and rapidly generalize to new tasks with limited examples, significantly lowering the barrier to entry for embodied intelligence applications. ZHIYUAN has successfully deployed GO-1 across its range of robotic platforms, continuously improving its capabilities and ushering in a new era for embodied AI.

AgiBot World: Fueling the Development of GO-1

The development of GO-1 was significantly aided by ZHIYUAN’s creation of AgiBot World in late 2024. This comprehensive dataset contains over one million trajectories, encompassing 217 tasks across five distinct scenarios. The high-quality, real-world data within AgiBot World provided a crucial foundation for training and refining GO-1’s capabilities.

Exceeding State-of-the-Art Performance

According to ZHIYUAN, GO-1’s ViLLA architecture allows it to surpass existing open-source state-of-the-art models in real-world dexterous manipulation and long-duration tasks. By predicting Latent Action Tokens, GO-1 effectively navigates the complexities of translating image and text instructions into concrete robotic actions.

The Future of Embodied AI

The launch of GO-1 represents a significant milestone in the development of embodied AI. By combining advanced perception, planning, and execution capabilities, GO-1 paves the way for robots that can seamlessly interact with and learn from the real world. ZHIYUAN’s innovative ViLLA architecture and the comprehensive AgiBot World dataset provide a strong foundation for future advancements in the field, promising a future where robots can perform increasingly complex and nuanced tasks.

References

ZHIYUAN Robotics. (2024). AgiBot World & Genie Operator-1 (GO-1). Retrieved from https://agibot-world.com/blog/agibot_go1.pdf


>>> Read more <<<

Views: 0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注