OpenAI's o3 Model Scores Genius-Level IQ, Outshining Multimodal AI in Text-Based Reasoning

June 10, 2025
OpenAI's o3 Model Scores Genius-Level IQ, Outshining Multimodal AI in Text-Based Reasoning
  • The Mensa Norway IQ test, typically reserved for human intelligence assessment, has an average human IQ range of 90 to 110, making these AI scores particularly impressive.

  • The paper critiques traditional benchmarks for AI models, suggesting they are flawed due to data contamination and do not accurately reflect the models' reasoning abilities.

  • These findings indicate that AI models are not merely mimicking human intelligence; they are outperforming humans in specific cognitive tasks, especially in text-based reasoning.

  • OpenAI's o3 model has achieved a remarkable IQ score of 135 on the Mensa Norway IQ test, categorizing it as the most cognitively capable AI system and placing it in the 'genius' category.

  • An analysis of 24 leading AI models revealed that text-only models significantly outperformed their vision-enabled counterparts.

  • These test outcomes raise important questions about AI model architecture and training, particularly regarding the distinction between general intelligence and domain-specific strengths.

  • In a related paper titled 'The Illusion of Thinking,' researchers contend that leading AI models simulate reasoning without genuine understanding, challenging the notion of their advanced reasoning capabilities.

  • Other notable AI models included Anthropic's Claude-4 Sonnet, which scored 127, and Google's Gemini 2.0 Flash, which scored 126, both indicating high intelligence levels.

  • Additionally, newer AI iterations like Gemini 2.5 Pro and OpenAI o4 mini scored above 120, demonstrating that many leading AI models exceed the average human IQ range.

  • The top 10 AI models listed in the analysis are all text-only, underscoring their superior performance in reasoning through language compared to visual processing capabilities.

  • Despite the advancements in language-based reasoning, multimodal systems still face challenges with abstract problem-solving tasks, indicating a significant area for future development.

  • For example, while the text-only o3 model scored 135, multimodal models like GPT-4o with vision and Grok-3 Think (Vision) scored only 63 and 60, respectively, highlighting a notable performance gap.

Summary based on 2 sources


Get a daily email with more AI stories

Sources

OpenAI’s o3 is Genius, Scores 135 in Toughest IQ test

Analytics India Magazine • Jun 10, 2025

OpenAI’s o3 is Genius, Scores 135 in Toughest IQ test

Ranked: The Smartest AI Models, by IQ

Visual Capitalist • Jun 9, 2025

Ranked: The Smartest AI Models, by IQ

More Stories