Apple’s research: Why can’t AI think logically?

A new study by Apple’s AI scientists has revealed that engines based on large language models (LLMs) still lack basic reasoning skills. Based on leading AI platforms like Meta and OpenAI, these models were found to have large impacts on answers even when small changes to queries were made. This raises questions about the reliability of AI systems. AI is not yet at human-level logical reasoning, which could lead to serious limitations in applications.

Artificial intelligence and lack of logical thinking: GSM-Symbolic test

Apple researchers have proposed a new test, the GSM-Symbolic benchmark, to measure the reasoning abilities of large language models. This test aims to test the reasoning abilities of various AI models. Initial tests have shown that even small word changes in questions can produce different and incorrect answers. Studies on the fragility of mathematical logic in particular have shown that the performance of these models drops significantly as the number of numerical values ​​or sentences in the questions increases.

The research found that context added to questions should not affect the underlying mathematical solution, but AI models began to take this information into account. For example, a small sentence added to a question was found to reduce the probability of getting the correct answer by 65%. Apple’s research noted that such vulnerabilities seriously affect the reliability of AI models, and that it is difficult to build reliable systems based on them.

One notable example from the study involved a math problem. The question involved a situation that required basic mathematical operation, and the model was given the following information: “On Friday, Oliver picked 44 kiwis. On Saturday, he picked 58 kiwis. On Sunday, he picked twice as many kiwis as he did on Friday.” Up until this point, the answer to the question was clear, but the addition of the phrase “five of the kiwis he picked on Sunday were smaller than average” caused the AI ​​to make an error.

This small detail shouldn’t have affected the answer, but OpenAI’s model and Meta’s Llama3-8b model both removed those five little kiwis from the total and reached the wrong conclusion. This example clearly demonstrates the shortcomings of language models’ reasoning abilities.

These results suggest that language models are acting more like advanced pattern recognition systems, generating answers based on patterns they see rather than logical reasoning. Apple’s study has shown how sensitive these models are, with even simple changes like changing names able to change the results. This fragility confirms that language models are not yet able to process anything close to human logic.

These fundamental logical flaws in AI models are supported not only by Apple’s study, but also by another study from 2019. That previous study showed how extraneous pieces of information added to questions about Super Bowl games could mislead AI. The research highlights that AI still has a long way to go before it can think like humans.

Leave a Reply

Your email address will not be published. Required fields are marked *