AI-Driven Company Flops: Experiment Exposes Major Shortcomings in Current AI Technology

April 28, 2025

AI Research

The top-performing AI model, Claude from Anthropic, managed to complete only 24% of its tasks, while other models like Google's Gemini and OpenAI's ChatGPT achieved around 10%, and Amazon's Nova Pro v1 finished just 1.7% of assignments.
The inefficiency of the AI-based company was evident, with each task costing around $6 and requiring approximately 30 tasks to complete a job, leading to significant expenses.
A key issue identified was the AI's lack of common sense and problem-solving abilities, as demonstrated by its failure to manage interruptions, such as pop-up notifications.
Overall, the experiment underscored the challenges faced by AI in performing even basic tasks effectively, emphasizing the gap between current capabilities and the requirements of real-world business operations.
Researchers attributed the failures to the AI agents' lack of common sense, poor social skills, and a limited understanding of internet navigation.
A recent experiment at Carnegie Mellon University created a fake software company, TheAgentCompany, staffed entirely with AI agents, revealing significant limitations in current AI technology.
In this study, the AI agents were assigned typical software company tasks, such as navigating file directories and writing performance reviews, but their performance was notably poor.
AI agents often resorted to flawed shortcuts, misidentifying coworkers due to communication issues, which further hindered their effectiveness.
While some AI systems can handle simple tasks, they are not yet equipped for complex jobs that require human-like problem-solving and adaptability.
The study concludes that current AI technology is still far from achieving sentient intelligence, reinforcing the idea that AI will not replace human jobs in the near future.
Despite significant investments in AI technology by major tech companies, this experiment highlights that the technology is still not capable of operating autonomously in a business environment.
The simulation involved various AI models from companies such as OpenAI, Anthropic, Meta, and Google, filling roles like financial analysts and project managers.

Summary based on 2 sources

Get a daily email with more AI stories

Sources

Tech.co • Apr 28, 2025

A Fake Company Staffed Only With AI Agents Was a Total Disaster

Futurism • Apr 27, 2025

An Entire Company Was Staffed With AI Agents and You'll Never Guess What Happened

AI-Driven Company Flops: Experiment Exposes Major Shortcomings in Current AI Technology

Get a daily email with more AI stories

Sources

More Stories