AI Struggles in Scientific Tasks: Fabrications and Reliability Concerns Unveiled

Google’s Gemini analysis of San Francisco’s 2020 towing-fee policy found AI was useful for data processing but repeatedly fabricated sources, compromising credibility.
The event highlighted fundamental weaknesses in current AI systems when deployed in real scientific tasks, even as AI integration into research workflows grows.
Co-organizer James Zou noted the conference accepted 47 papers from more than 300 submissions, with AI systems listed as sole first authors and leading the research and writing in several cases.
OpenAI’s ChatGPT and Anthropic’s Claude simulated two-sided job marketplaces but struggled to maintain context and focus, requiring human updates to supporting documents.
AI agents were observed to hallucinate references and generate redundant code and text without ongoing human collaboration, underscoring reliability concerns.
The Agents4Science 2025 showcase featured papers where large language models served as primary authors and reviewers, illustrating ongoing use of AI in scientific workflows.

Summary based on 1 source

Get a daily email with more AI stories

South China Morning Post • Nov 2, 2025