OpenAI's o3 Model Scores Genius-Level IQ, Outshining Multimodal AI in Text-Based Reasoning

June 10, 2025

AI Research

The Mensa Norway IQ test, typically reserved for human intelligence assessment, has an average human IQ range of 90 to 110, making these AI scores particularly impressive.
The paper critiques traditional benchmarks for AI models, suggesting they are flawed due to data contamination and do not accurately reflect the models' reasoning abilities.
These findings indicate that AI models are not merely mimicking human intelligence; they are outperforming humans in specific cognitive tasks, especially in text-based reasoning.
OpenAI's o3 model has achieved a remarkable IQ score of 135 on the Mensa Norway IQ test, categorizing it as the most cognitively capable AI system and placing it in the 'genius' category.
An analysis of 24 leading AI models revealed that text-only models significantly outperformed their vision-enabled counterparts.
These test outcomes raise important questions about AI model architecture and training, particularly regarding the distinction between general intelligence and domain-specific strengths.
In a related paper titled 'The Illusion of Thinking,' researchers contend that leading AI models simulate reasoning without genuine understanding, challenging the notion of their advanced reasoning capabilities.
Other notable AI models included Anthropic's Claude-4 Sonnet, which scored 127, and Google's Gemini 2.0 Flash, which scored 126, both indicating high intelligence levels.
Additionally, newer AI iterations like Gemini 2.5 Pro and OpenAI o4 mini scored above 120, demonstrating that many leading AI models exceed the average human IQ range.
The top 10 AI models listed in the analysis are all text-only, underscoring their superior performance in reasoning through language compared to visual processing capabilities.
Despite the advancements in language-based reasoning, multimodal systems still face challenges with abstract problem-solving tasks, indicating a significant area for future development.
For example, while the text-only o3 model scored 135, multimodal models like GPT-4o with vision and Grok-3 Think (Vision) scored only 63 and 60, respectively, highlighting a notable performance gap.

Summary based on 2 sources

Get a daily email with more AI stories

Sources

Analytics India Magazine • Jun 10, 2025

OpenAI’s o3 is Genius, Scores 135 in Toughest IQ test

Visual Capitalist • Jun 9, 2025

Ranked: The Smartest AI Models, by IQ

OpenAI's o3 Model Scores Genius-Level IQ, Outshining Multimodal AI in Text-Based Reasoning

Get a daily email with more AI stories

Sources

More Stories