OpenAI's o3 Model Scores Genius-Level IQ, Outshining Multimodal AI in Text-Based Reasoning
June 10, 2025
The Mensa Norway IQ test, typically reserved for human intelligence assessment, has an average human IQ range of 90 to 110, making these AI scores particularly impressive.
The paper critiques traditional benchmarks for AI models, suggesting they are flawed due to data contamination and do not accurately reflect the models' reasoning abilities.
These findings indicate that AI models are not merely mimicking human intelligence; they are outperforming humans in specific cognitive tasks, especially in text-based reasoning.
OpenAI's o3 model has achieved a remarkable IQ score of 135 on the Mensa Norway IQ test, categorizing it as the most cognitively capable AI system and placing it in the 'genius' category.
An analysis of 24 leading AI models revealed that text-only models significantly outperformed their vision-enabled counterparts.
These test outcomes raise important questions about AI model architecture and training, particularly regarding the distinction between general intelligence and domain-specific strengths.
In a related paper titled 'The Illusion of Thinking,' researchers contend that leading AI models simulate reasoning without genuine understanding, challenging the notion of their advanced reasoning capabilities.
Other notable AI models included Anthropic's Claude-4 Sonnet, which scored 127, and Google's Gemini 2.0 Flash, which scored 126, both indicating high intelligence levels.
Additionally, newer AI iterations like Gemini 2.5 Pro and OpenAI o4 mini scored above 120, demonstrating that many leading AI models exceed the average human IQ range.
The top 10 AI models listed in the analysis are all text-only, underscoring their superior performance in reasoning through language compared to visual processing capabilities.
Despite the advancements in language-based reasoning, multimodal systems still face challenges with abstract problem-solving tasks, indicating a significant area for future development.
For example, while the text-only o3 model scored 135, multimodal models like GPT-4o with vision and Grok-3 Think (Vision) scored only 63 and 60, respectively, highlighting a notable performance gap.
Summary based on 2 sources
Get a daily email with more AI stories
Sources

Analytics India Magazine • Jun 10, 2025
OpenAI’s o3 is Genius, Scores 135 in Toughest IQ test
Visual Capitalist • Jun 9, 2025
Ranked: The Smartest AI Models, by IQ