阿里云近日公布了其多模态大模型研究的新进展。据悉,通义千问视觉理解模型Qwen-VL继Plus版本之后,再次推出了Max版本。这一升级版模型不仅拥有更强的视觉推理能力和中文理解能力,还能根据图片识人、答题、创作、写代码。在多个权威测评中,Qwen-VL-Max展现出了卓越的性能,整体实力堪比GPT-4V和Gemini Ultra。
在MMMU、MathVista等测评中,Qwen-VL-Plus和Qwen-VL-Max的成绩远超业界所有开源模型。在文档分析(DocVQA)、中文图像相关(MM-Bench-CN)等任务上,Qwen-VL-Max甚至超越了GPT-4V,达到了世界最佳水平。这一成果无疑再次刷新了业界对阿里云通义千问模型的认知,进一步巩固了我国在大模型研究领域的国际地位。
英文标题Title:Alibaba Cloud TuringQA Qwen-VL Max: Outperforms GPT-4V and Gemini
英文关键词Keywords:Alibaba Cloud, TuringQA, Qwen-VL Max
英文新闻内容News content:
Alibaba Cloud has recently announced new progress in its research on multimodal large models. It is learned that the TuringQA visual understanding model Qwen-VL, after the Plus version, has launched the Max version. This upgraded model not only has stronger visual reasoning and Chinese understanding capabilities but can also recognize people, answer questions, create, and write code based on images. In multiple authoritative evaluations, Qwen-VL-Max has demonstrated excellent performance, with overall strength comparable to GPT-4V and Gemini Ultra.
In evaluations such as MMMU and MathVista, the scores of Qwen-VL-Plus and Qwen-VL-Max far exceed those of all industry open-source models. In tasks such as document analysis (DocVQA) and Chinese image-related (MM-Bench-CN), Qwen-VL-Max has even surpassed GPT-4V, reaching the world’s best level. This achievement undoubtedly refreshes the industry’s understanding of Alibaba Cloud’s TuringQA model and further solidifies China’s international position in the field of large model research.
【来源】https://news.mydrivers.com/1/960/960575.htm
Views: 27