Google DeepMind's SIMA 2 AI Revolutionizes 3D Game Interaction with High-Level Goal Understanding

November 13, 2025
Google DeepMind's SIMA 2 AI Revolutionizes 3D Game Interaction with High-Level Goal Understanding
  • SIMA 2, a new AI agent from Google DeepMind built on the Gemini model, learns and acts in 3D video games with high-level goal understanding, user interaction, and self-driven skill acquisition.

  • The project aims to ground natural language and goals in visual environments, decomposing tasks into executable subtasks to enable joint language-driven reasoning and embodied control.

  • SIMA 2 can adapt to unseen, procedurally generated 3D worlds (including Genie 3-created environments) and complete tasks within them.

  • The publication includes an ethical disclaimer from StartupNews.fyi, asserting impartiality and accuracy in reporting.

  • While progress toward real-world robots is possible, researchers remain skeptical about direct transfer of game-trained skills to physical robotics.

  • Draft-like material and placeholders on the page suggest this is not a finished article.

  • Reported capabilities include autonomous problem-solving, user interaction, and self-improvement through practice.

  • The broader goal is to develop next-generation agents capable of following open-ended instructions in complex environments, informing future robotic collaborators.

  • SIMA 2 operates as an embodied agent with a body and sensors, observing inputs, reasoning, and acting in 3D worlds rather than just performing calendar tasks or code execution.

  • Experts acknowledge the achievement but warn that training from visual input alone may not readily transfer to real-world robotics due to differences in perception, control, and environment complexity.

  • SIMA 2 can respond to emoji instructions to perform high-level actions, illustrating how high-level intent can be conveyed with minimal symbols.

  • The core innovation ties language instructions to visual context and breaks tasks into subtasks, enabling integrated language-driven reasoning and embodied action.

Summary based on 10 sources


Get a daily email with more Tech stories

Sources




Google’s SIMA 2 agent uses Gemini in virtual worlds

More Stories