New ARC-AGI-2 Test Reveals AI Struggles with Human-Level Intelligence in Puzzle Challenges

March 25, 2025
New ARC-AGI-2 Test Reveals AI Struggles with Human-Level Intelligence in Puzzle Challenges
  • The ARC Prize Foundation has launched a new evaluation test called ARC-AGI-2, designed to assess AI models' performance against human intelligence.

  • This test features puzzle-like problems that challenge AI to identify visual patterns and generate correct answer grids, effectively measuring adaptability to new challenges.

  • The introduction of ARC-AGI-2 responds to industry demands for new benchmarks, particularly in evaluating AI's creative capabilities.

  • Unlike previous assessments, this benchmark emphasizes efficiency, requiring AI to interpret patterns dynamically rather than relying solely on memorization.

  • The ARC-AGI benchmark focuses on general intelligence through visual puzzles, assessing reasoning and pattern recognition.

  • The low performance of AI models on ARC-AGI-2 indicates a significant gap in nuanced understanding and contextual awareness compared to humans.

  • Humans achieved an average score of 60% on the ARC-AGI-2 test, while top AI models like OpenAI's o3-low scored only 4%, highlighting the increased difficulty of this edition.

  • OpenAI's previous model had scored 75.7% on the first edition of the ARC-AGI test, showcasing the heightened challenge presented by ARC-AGI-2.

  • In conjunction with the new test, the ARC Prize Foundation has announced the Arc Prize 2025 contest, which challenges developers to achieve 85% accuracy on ARC-AGI-2 while limiting costs to $0.42 per task.

  • This new test not only evaluates problem-solving ability but also emphasizes the cost-effectiveness of achieving those solutions, marking a significant shift in evaluation standards.

  • Despite advancements in AI, some researchers argue that these tests may not accurately reflect general intelligence, focusing instead on task-specific performance.

  • Experts remain divided on the timeline for achieving artificial general intelligence (AGI), with opinions varying widely on how close we are to realizing this technology.

Summary based on 5 sources


Get a daily email with more Tech stories

More Stories