New ARC-AGI-2 Test Reveals AI Struggles with Human-Level Intelligence in Puzzle Challenges

March 25, 2025

Tech

AI Research

The ARC Prize Foundation has launched a new evaluation test called ARC-AGI-2, designed to assess AI models' performance against human intelligence.
This test features puzzle-like problems that challenge AI to identify visual patterns and generate correct answer grids, effectively measuring adaptability to new challenges.
The introduction of ARC-AGI-2 responds to industry demands for new benchmarks, particularly in evaluating AI's creative capabilities.
Unlike previous assessments, this benchmark emphasizes efficiency, requiring AI to interpret patterns dynamically rather than relying solely on memorization.
The ARC-AGI benchmark focuses on general intelligence through visual puzzles, assessing reasoning and pattern recognition.
The low performance of AI models on ARC-AGI-2 indicates a significant gap in nuanced understanding and contextual awareness compared to humans.
Humans achieved an average score of 60% on the ARC-AGI-2 test, while top AI models like OpenAI's o3-low scored only 4%, highlighting the increased difficulty of this edition.
OpenAI's previous model had scored 75.7% on the first edition of the ARC-AGI test, showcasing the heightened challenge presented by ARC-AGI-2.
In conjunction with the new test, the ARC Prize Foundation has announced the Arc Prize 2025 contest, which challenges developers to achieve 85% accuracy on ARC-AGI-2 while limiting costs to $0.42 per task.
This new test not only evaluates problem-solving ability but also emphasizes the cost-effectiveness of achieving those solutions, marking a significant shift in evaluation standards.
Despite advancements in AI, some researchers argue that these tests may not accurately reflect general intelligence, focusing instead on task-specific performance.
Experts remain divided on the timeline for achieving artificial general intelligence (AGI), with opinions varying widely on how close we are to realizing this technology.

Summary based on 5 sources

Get a daily email with more Tech stories

Sources

TechCrunch • Mar 25, 2025

A new, challenging AGI test stumps most AI models | TechCrunch

Mashable • Mar 25, 2025

A new AI test is outwitting OpenAI, Google models, among others

New Scientist • Mar 25, 2025

Leading AI models fail new test of artificial general intelligence

New ARC-AGI-2 Test Reveals AI Struggles with Human-Level Intelligence in Puzzle Challenges

Get a daily email with more Tech stories

Sources

More Stories