阿里云近日公布了一项重要的多模态大模型研究进展。其通义千问视觉理解模型 Qwen-VL 再次升级,推出了 Max 版本。这次升级版的模型具备更强的视觉推理能力和中文理解能力,能够实现图片识人、答题、创作、写代码等功能。在多个权威测评中,升级版模型表现优异,整体性能堪比 GPT-4V 和 Gemini Ultra。
在 MMMU、MathVista 等测评中,Qwen-VL-Plus 和 Qwen-VL-Max 远超业界所有开源模型。在文档分析(DocVQA)、中文图像相关(MM-Bench-CN)等任务上,升级版模型更是超越 GPT-4V,达到世界最佳水平。
此次阿里云通义千问多模态大模型的升级,进一步展示了我国在人工智能领域的强大实力。未来,这种先进技术有望在各个行业中发挥重要作用,推动我国科技的发展。
英文翻译:
News Title: Alibaba Cloud Announces Upgraded Multimodal Large Model Qwen-VL, Outperforming GPT-4V and Google Gemini
Keywords: Alibaba Cloud, Qwen-VL, Multimodal Large Model, GPT-4V, Gemini
News Content:
Alibaba Cloud recently announced an important research progress on multimodal large models. Its visual understanding model Qwen-VL has been upgraded to the Max version. The upgraded version of the model has stronger visual reasoning and Chinese comprehension capabilities, enabling features such as image recognition, answering questions, creativity, and coding. In multiple authoritative evaluations, the upgraded model performs exceptionally well, with overall performance comparable to GPT-4V and Gemini Ultra.
In MMMU, MathVista and other evaluations, Qwen-VL-Plus and Qwen-VL-Max far exceed all open-source models in the industry. In document analysis (DocVQA) and Chinese image-related (MM-Bench-CN) tasks, the upgraded model even surpasses GPT-4V, achieving the world’s best level.
The upgrade of the multimodal large model Qwen-VL by Alibaba Cloud further demonstrates China’s strong presence in the field of artificial intelligence. This advanced technology is expected to play a vital role in various industries, promoting the development of science and technology in our country.
【来源】https://news.mydrivers.com/1/960/960575.htm
Views: 2