Apples’ New Paper Challenges AI Reasoning: Can Large Language Models Really Think?
A recent study by Apple researchers casts doubt on the reasoning abilities of large language models (LLMs), suggesting they may not be as capable of thinking or reasoning as we believe. The paper, titled GSM-Symbolic: Understanding the Limitations of MathematicalReasoning in Large Language Models, has sparked widespread discussion within the AI community.
The study’s authors, led by Apple machine learning research engineer Iman Mirzadehand including Samy Bengio (brother of Turing Award winner Yoshua Bengio), investigated the performance of LLMs on simple mathematical problems. They found that while LLMs can often solve basic arithmetic problems, their performance deteriorates significantly when presented withseemingly irrelevant information, even if that information is clearly unrelated to the problem itself.
One example highlights this issue: Imagine a simple math problem: Oliver picked 44 kiwis on Friday. Then he picked 58 kiwis onSaturday. On Sunday, he picked twice the number of kiwis he picked on Friday. How many kiwis did Oliver pick in total? This problem is straightforward, and LLMs typically solve it correctly.
However, the researchers added a seemingly irrelevant sentence to the problem: Oliver picked 44 kiwis onFriday. Then he picked 58 kiwis on Saturday. On Sunday, he picked twice the number of kiwis he picked on Friday. He also loves to eat apples. How many kiwis did Oliver pick in total?
The addition of the sentence about apples significantly impacted the LLMs’ performance. This suggests that LLMs struggle to filter out irrelevant information and focus on the core elements of a problem, a crucial aspect of reasoning.
The study’s findings raise important questions about the true capabilities of LLMs. While they excel at tasks like generating text and translating languages, their ability to reason logically andsolve problems may be more limited than previously thought.
This research has significant implications for the future development of AI. If LLMs are indeed unable to reason effectively, it may require a fundamental shift in how we design and train these models. Further research is needed to understand the limitations of LLMs and explore new approaches toenhance their reasoning abilities.
References:
- Mirzadeh, I., Bengio, S., et al. (2024). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. arXiv preprint arXiv:2410.00000.
Views: 0