Skip to content
Search

Latest Stories

Apple study reveals major flaws in billion-dollar AI models

AI models still struggle with basic logical tasks

Apple Research Exposes AI Model Weaknesses

Apple researchers evaluated several prominent generative AI systems

iStock

A new research paper from Apple has exposed serious shortcomings in the reasoning abilities of some of today’s most advanced artificial intelligence (AI) systems. Despite being marketed as powerful tools capable of solving complex problems, the study shows that these models still struggle with basic logical tasks, raising questions about the real capabilities of large language and reasoning models.

AI models fail child-level logic tests

Apple researchers evaluated several prominent generative AI systems, including ChatGPT, Claude, and DeepSeek, using classic problem-solving tasks. One of the tests was the well-known Tower of Hanoi puzzle, which requires moving discs across pegs while following specific rules.


While the puzzle is simple enough for a bright child to solve, most AI models failed when asked to handle scenarios involving more than seven discs. Accuracy fell below 80% with seven discs, and performance dropped even further with eight. According to co-lead author Iman Mirzadeh, the issue wasn't just solving the puzzle — it was that the models couldn’t follow a logical thought process even when given the solution algorithm.

“They fail to reason in a step-by-step, structured way,” he said, noting that the models’ approach was neither logical nor intelligent.

The myth of scaling exposed

The results challenge one of the AI industry’s most commonly held beliefs: that simply scaling models — making them larger and feeding them more data — will lead to better performance. Apple’s research provides strong evidence that this is not always true.

Gary Marcus, a well-known AI researcher and commentator, called the findings a reality check. Venture capitalist Josh Wolfe even coined a new verb, “to GaryMarcus”, meaning to critically debunk exaggerated claims about AI. The Apple study, Wolfe argued, had done exactly that by revealing the real limits of model reasoning.

Marcus has long argued that AI systems, particularly those based on neural networks, can only generalise within the data they’ve seen before. Once asked to work beyond that training distribution, they often break down — a pattern clearly confirmed in Apple’s tests.

AI is not yet a substitute for human logic

To be clear, even humans make errors on the more complex versions of the Tower of Hanoi. However, AI systems were supposed to improve on this, not replicate human flaws. As Marcus points out, artificial general intelligence (AGI) should combine human creativity with machine-level precision. But instead of outperforming people in logic and reliability, today’s large models still make basic errors.

Apple AI studyMost AI models failed when asked to handle scenarios involving more than seven discsiStock

Apple’s results also support concerns raised by Arizona State University’s Subbarao Kambhampati, who has cautioned against assuming AI models reason like humans. In reality, they often skip steps or fail to understand the underlying principles of a problem, despite producing convincing-sounding answers.

Caution urged for businesses and society

The implications are significant for businesses looking to integrate AI into their operations. While models such as GPT-4, Claude, and others perform well in areas like writing, coding, and brainstorming, they remain unreliable for high-stakes decision-making. As Marcus points out, these systems can’t yet outperform classical algorithms in areas like database management, protein folding, or strategic games like chess.

This unpredictability limits how much society can rely on generative AI. While the technology will continue to be useful in supporting human tasks, it is far from being a replacement for human judgement or traditional rule-based systems in critical contexts.

The illusion of intelligence

Perhaps most concerning is how easily these models can appear more capable than they are. If an AI performs well on an easy test, users may assume it can handle more complex problems too. But Apple’s study shows this confidence can be misplaced. The same model that solves a four-disc puzzle may completely fail when asked to solve one with eight.

This illusion of intelligence could lead to overtrust in AI systems — something experts warn must be avoided if the technology is to be used responsibly.

Rethinking the future of AI

Despite the findings, Marcus remains optimistic about AI’s future, just not in its current form. He believes that hybrid approaches, combining classical logic with modern computing power, could eventually produce more reliable systems. But he is sceptical that current LLM-based systems are the answer.

The Apple paper shows that hype around generative AI has outpaced its real-world abilities. Until AI can reason in a consistent, logical manner — not just produce convincing text — it will remain limited in scope.

As researchers and developers reflect on these findings, one thing is clear: the path to truly intelligent machines will require more than just scaling up. It will demand smarter, better-designed models that prioritise reliability over illusion.

More For You

Smartless Mobile launched by podcast trio

The move marks the first commercial spin-off from the Smartless podcast

Getty Images

Smartless podcast trio launches Smartless Mobile as low-cost phone service

The hosts of the popular Smartless podcast, actors Will Arnett, Jason Bateman and Sean Hayes, have launched a new mobile phone service in the United States. Called Smartless Mobile, the service offers a budget-friendly alternative to traditional phone plans and is aimed at users who spend most of their time connected to WiFi.

The move marks the first commercial spin-off from the Smartless podcast, which is known for its celebrity interviews and humorous tone. The new venture was announced in early June 2025 and has already begun accepting sign-ups across the US mainland and Puerto Rico.

Keep ReadingShow less
bestway

Bestway began its anniversary year in January with its annual ‘Thank You’ campaign, offering deals on products in-store and online.

Getty images

Bestway celebrates 50 years in wholesale sector

BESTWAY Wholesale is marking its 50th anniversary in 2025. Founded in 1975, the company opened its first warehouse in Acton, West London, and has since grown into one of the UK’s largest independent wholesalers.

The business was started by Sir Anwar Pervez. He was awarded a knighthood in 1999 for his contributions to the food wholesale sector. Under his leadership, Bestway achieved £12 million in turnover within its first 18 months, launched the best-one symbol group in 2002, acquired Batley’s in 2005, Costcutter Supermarkets Group in 2020, and Adams Foodservice in 2024.

Keep ReadingShow less
Surinder Arora

Arora’s plan could involve a shorter runway, potentially avoiding the need to divert the M25 motorway and significantly reducing costs and time. (Photo: LinkedIn/Surinder Arora)

Billionaire Surinder Arora bids for Heathrow expansion with shorter, low-cost runway plan

BILLIONAIRE hotel entrepreneur Surinder Arora has announced plans for a cheaper alternative to Heathrow Airport’s third runway, claiming he can deliver it for a third less than the airport’s own estimate.

Arora, one of Heathrow’s largest landowners, is partnering with US engineering company Bechtel to submit a proposal after aviation minister Mike Kane said the Government was open to alternative bids.

Keep ReadingShow less
Elon Musk Publicly Corrects Grok AI Misinformation

This event has also raised concerns about the reliability of AI tools like Grok

Getty Images

Elon Musk corrects Grok chatbot after it claims he ‘took’ Stephen Miller’s wife

Elon Musk was forced to step in on Sunday after his artificial intelligence chatbot, Grok, falsely claimed he had “taken” the wife of former Trump adviser Stephen Miller. The misleading statement came after a doctored screenshot circulated on X, formerly Twitter, appearing to show a personal exchange between Musk and Miller.

The fabricated post, supposedly from Miller, read: “We will take back America,” to which Musk allegedly replied, “Just like I took your wife.” The image was shared by a user on X, prompting them to ask Grok whether the exchange was real.

Keep ReadingShow less