通义千问开源数学模型：超越GPT-4，成为最先进的数学专项模型

8月9日，阿里巴巴集团旗下的通义团队宣布，他们开源了一款全新的数学专项模型——Qwen2-Math，该模型在权威的数学问题处理基准MATH上取得了84%的准确率，超越了包括GPT-4、Claude-3.5-Sonnet、Gemini-1.5-Pro和Llama-3.1-405B在内的多个开放和闭源模型。这款模型基于通义千问的开源大语言模型Qwen2研发，并包含了1.5B、7B和72B三个不同参数的基础模型以及指令微调模型。

Qwen2-Math能够处理代数、几何、计数与概率、数论等多种数学问题，展现了其在数学问题处理方面的强大能力。这款模型的研发团队表示，他们通过精心设计的数学专用语料库进行预训练，并使用高质量的数学网络文本、书籍、代码和考试题目作为训练数据，以确保模型的准确性和可靠性。所有预训练和微调数据集都进行了去污染处理，以确保模型的输出是干净且无偏的。

在指令微调方面，研发团队首先基于Qwen2-Math-72B训练了一个数学专用奖励模型，然后将密集的奖励信号与指示模型是否正确回答问题的二元信号结合，用作学习标签。通过拒绝采样构建监督微调（SFT）数据，并在SFT模型基础上使用GRPO方法优化模型。

Qwen2-Math系列模型目前主要支持英文，但通义团队表示，他们将很快推出中英双语版本，并计划开发多语言版本。在性能评估方面，通义团队在多个中英文数学基准测评集上对指令微调模型进行了测试，包括GSM8K、MATH、OlympiadBench、CollegeMath、GaoKao、AIME2024、AMC2023等，Qwen2-Math-72B-Instruct在这些测试中均表现出色。

通义团队表示，开源Qwen2-Math的目的是为了科学界解决高级数学问题做出贡献，未来他们将不断加强模型的数学能力。这款模型的开源，无疑将促进人工智能在数学领域的发展，并为研究者们提供了一个强大的工具。

英语如下：

News Title: “Tongyin Qianwen Open-Source Mathematical Model: Becoming the Most Advanced Mathematical Specialized Model Beyond GPT-4”

Keywords: Open-Source Mathematics, Qwen2-Math, Advanced Model

News Content: On August 9, the Tongyin team, part of Alibaba Group, announced the release of a new open-source mathematical special model – Qwen2-Math. The model achieved an accuracy rate of 84% on the authoritative mathematical problem-solving benchmark MATH, surpassing multiple open and closed models including GPT-4, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B. This model is developed from the Tongyin Qianwen open-source large language model Qwen2 and includes three different parameter base models of 1.5B, 7B, and 72B, as well as instruction fine-tuning models.

Qwen2-Math can handle various mathematical problems such as algebra, geometry, combinatorics and probability, number theory, etc., showcasing its powerful ability in mathematical problem-solving. The development team of Qwen2-Math stated that they pre-trained the model with a carefully designed mathematical specialized corpus and used high-quality mathematical network texts, books, codes, and exam questions as training data to ensure the accuracy and reliability of the model. All pre-training and fine-tuning datasets have been decontaminated to ensure that the model’s output is clean and unbiased.

In terms of instruction fine-tuning, the development team first trained a mathematical special reward model based on Qwen2-Math-72B, and then combined dense reward signals with binary signals indicating whether the model correctly answers the questions as learning labels. By constructing supervised fine-tuning (SFT) data through rejection sampling and optimizing the model based on the GRPO method on the SFT model.

The Qwen2-Math series models are currently primarily supported in English, but the Tongyin team plans to release a bilingual Chinese-English version soon and aims to develop multilingual versions. In terms of performance evaluation, the Tongyin team tested the instruction fine-tuning models on multiple Chinese and English mathematical benchmark test sets, including GSM8K, MATH, OlympiadBench, CollegeMath, GaoKao, AIME2024, AMC2023, etc., and Qwen2-Math-72B-Instruct performed well in all these tests.

The Tongyin team stated that the purpose of releasing Qwen2-Math as open source is to contribute to the scientific community in solving advanced mathematical problems. In the future, they will continuously enhance the mathematical capabilities of the model. The open-source of Qwen2-Math will undoubtedly promote the development of artificial intelligence in the field of mathematics and provide researchers with a powerful tool.

【来源】https://www.jiqizhixin.com/articles/2024-08-09-6

一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

通义千问开源数学模型：超越GPT-4，成为最先进的数学专项模型

作者智能小编

相关文章

永新光学 (603297.SH) ：国产替代与新兴业务驱动下的价值重估

来伊份：转型阵痛中的价值重塑与未来突围

北方稀土 (600111.SH): 战略核心资产的价值重估——迎接“戴维斯双击”

发表回复取消回复

为您推荐