OpenAI Unveils FrontierScience to Test AI's PhD-Level Scientific Reasoning Across Disciplines
December 16, 2025
Skepticism remains about LLMs generating genuine discoveries, with experts warning about unreliable outputs and the need for reliable, expert-informed evaluation.
OpenAI introduced FrontierScience, a new benchmark designed to test PhD-level scientific reasoning in AI across physics, chemistry, and biology, featuring expert-written Olympiad-style and research-style questions.
GPT-5.2 leads the benchmark, scoring 77.1% in the Olympiad tier and 25.3% in Research, with notable gains over GPT-5 in the latter domain.
The benchmark is meant to reveal where AI excels or struggles in advanced scientific reasoning and to guide researchers and enterprises integrating AI into scientific workstreams.
Industry and academia acknowledge substantial progress but warn that models still struggle with exploratory, hypothesis-driven workflows and can hallucinate on edge cases, necessitating human oversight.
FrontierScience implicates licensing, subscriptions, and monetization of frontier-scientific models, with potential R&D cost reductions cited by firms like Deloitte.
Analysts project AI could generate significant economic value by 2030, with industry reports forecasting growth in AI for science and reductions in R&D costs through AI.
Key challenges include data privacy, the need for federated learning, and regulatory considerations such as transparency mandates under the EU AI Act for high-risk AI applications.
Ethical and regulatory concerns emphasize diverse training data to avoid biases and the requirement for transparency in high-risk AI systems.
Real-world lab adoption is underway, with pharma and research collaborators using AI for triage and rapid hypothesis validation to accelerate discovery timelines.
Potential solutions point to hybrid approaches that combine large language models with specialized scientific tools and symbolic reasoning.
Future outlook envisions increasing AI independence in hypothesis generation by around 2027, with improving task success but persistent gaps in interdisciplinary reasoning, favoring hybrid human-AI workflows.
Summary based on 6 sources
Get a daily email with more World News stories
Sources

Time • Dec 16, 2025
AI Is Getting Better at Science. OpenAI Is Testing How Far It Can Go
OpenAI • Dec 16, 2025
Evaluating AI’s ability to perform scientific research tasks
WebProNews • Dec 16, 2025
OpenAI’s FrontierScience Benchmark Ushers in Era of PhD-Level AI Reasoning
Happy Mag • Dec 16, 2025
OpenAI introduces FrontierScience; a gauge for AI’s readiness for Scientific Research