Vivo’s DeepSeek R1 Open-Source RL Achieves GUI Agent Breakthrough with Just 136 Screenshots

Beijing, April 8, 2025 – In a significant advancement for the field of GUI (Graphical User Interface) automation, Chinese tech giant Vivo has open-sourced its research project, UI-R1, a novel reinforcement learning (RL) framework designed to enhance the action prediction capabilities of GUI agents. This development, inspired by DeepSeek-R1’s success in mathematical problem-solving, leverages rule-based reinforcement learning (RL/RFT) to achieve improved performance with significantly less data than traditional supervised fine-tuning (SFT) methods.

The research, conducted in collaboration with a team from the Chinese University of Hong Kong, addresses the challenge of training GUI agents to effectively interact with complex user interfaces. Traditional methods often require massive datasets of labeled examples, a costly and time-consuming process. UI-R1 circumvents this limitation by employing a pre-defined reward function, eliminating the need for manual annotation. This approach allows the model to learn through trial and error, guided by the reward signal, and achieve strong performance with as few as 136 screenshots.

The research paper, titled UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning, is available on arXiv (https://arxiv.org/abs/2503.21620), and the project homepage (https://yxchai.com/UI-R1/) and code repository (https://github.com/lll6gg/UI-R1) are now publicly accessible, fostering collaboration and further development within the AI community.

The core innovation of UI-R1 lies in its application of the rule-based RL paradigm to GUI action prediction based on low-level instructions. The system utilizes a multi-modal large language model (LLM) to generate multiple response trajectories for each input, encompassing both reasoning markers and final answers. The prompt design, carefully crafted for both training and testing phases, guides the LLM in understanding the task and generating appropriate actions.

The designed reward function then evaluates each trajectory, providing feedback to the LLM and guiding it towards optimal action sequences. This approach mirrors the success of DeepSeek-R1 in mathematical problem-solving, where IOU (Intersection over Union) is often used as a rule-based reward in multi-modal tasks such as image localization.

The open-sourcing of UI-R1 marks a significant step towards more efficient and adaptable GUI automation. By reducing the reliance on large, labeled datasets, Vivo’s research paves the way for the development of GUI agents capable of learning and adapting to new interfaces with minimal human intervention. This has profound implications for a wide range of applications, including automated software testing, robotic process automation, and assistive technologies for individuals with disabilities.

The research team believes that UI-R1 can be further extended and refined, exploring more sophisticated reward functions and incorporating other modalities such as audio and haptic feedback. The open-source nature of the project encourages researchers and developers worldwide to contribute to its evolution and unlock its full potential. This initiative from Vivo highlights the growing importance of reinforcement learning in solving real-world problems and its potential to revolutionize the way humans interact with technology.

References:

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning. (2025). Retrieved from https://arxiv.org/abs/2503.21620
UI-R1 Project Homepage: https://yxchai.com/UI-R1/
UI-R1 Code Repository: https://github.com/lll6gg/UI-R1

>>> Read more <<<