Google's SIMA 2: Revolutionizing AI with Advanced 3D World Interaction and Self-Improvement

November 13, 2025
Google's SIMA 2: Revolutionizing AI with Advanced 3D World Interaction and Self-Improvement
  • Despite advances, SIMA 2 still struggles with very long tasks and maintaining memory in a limited window, and can face visual interpretation challenges.

  • The potential real-world impact hinges on translating virtual-world capabilities into physical robotics through strong world understanding and reasoning.

  • Future implications include AI co-players in open-world games, smarter NPCs, and training grounds for real-world robotics, plus universal language-to-action assistants for digital environments.

  • SIMA 2 is Google's upgraded AI agent that can act, learn, and reason inside 3D virtual worlds, functioning as a co-player that understands prompts, discusses plans, and improves over time.

  • The system benefits from large-scale demonstrations from partners and can start new tasks in unfamiliar environments, like MineDojo, by leveraging its learned planning and communication.

  • Self-play data helps train later versions, with ongoing work to tackle longer tasks, memory, precise control, and complex 3D scene understanding.

  • In related news, Marble, a generative world model, creates 3D worlds from text, images, or video and supports interactive editing.

  • The work illustrates progress toward more general AI systems capable of adapting to new tasks and environments, with implications for future robotic agents that learn with minimal human intervention.

  • DeepMind identifies gaps such as a limited memory window, difficulty with very long multi-step tasks, and challenges in 3D visual interpretation.

  • It handles long, multi-step tasks and multimodal prompts—including sketches, multiple languages, and emojis—and can transfer concepts across games, such as mining concepts moving to harvesting.

  • SIMA 2 serves as a testbed for robotics-relevant skills and navigation, signaling a strong research path toward practical robotics applications and broader AGI potential.

  • Experts emphasize the broader significance: the approach centers on self-guided learning, continuous improvement, and the potential to transfer to real-world robotics through better navigation, tool use, and teamwork.

  • Limitations include struggles with long-horizon planning, precise low-level actions, robust visual understanding, and a constrained memory window.

  • DeepMind envisions real-world robotics use, emphasizing that high-level understanding plus low-level action control are needed for autonomous operation.

  • The Gemini model underpins SIMA 2, enabling interpretation of goals, task reasoning, action explanations, and self-assessment, with training boosted by human demonstrations and Gemini-generated labels.

  • Core architecture centers on Gemini for instruction interpretation, goal understanding, and action planning.

  • Access is restricted to a research preview for a small group, with oversight of self-improvement and requests for interdisciplinary feedback to ensure responsible development.

  • The broader goal is progress toward artificial general intelligence, with challenges in understanding user intent, planning effectively, and applying common-sense reasoning.

  • SIMA 2 is being released as a limited research preview to select academics and game developers, stressing responsible development and collaboration.

  • Current scope positions SIMA 2 as a research preview, not a consumer tool, tested with academics and game developers under controlled conditions due to self-learning capabilities.

  • It can interpret user visuals or shapes and operate in AI-generated 3D worlds created by Genie, adapting to entirely new environments.

  • Researchers involved include Joe Marino, Jane Wang, and Frederic Besse, building on DeepMind's prior work like AlphaFold and SIMA 1 unveiled in 2024.

  • Compared to SIMA, SIMA 2 adds thinking about commands and greater flexibility, expanding the skill set beyond the original 600 defined actions.

  • In fresh environments, SIMA 2 quickly orients, analyzes surroundings and goals, and applies prior concepts to new tasks like transferring mining ideas to harvesting.

  • SIMA 2 achieves a higher task completion rate than SIMA 1 and shows stronger generalization in Genie 3-generated 3D environments.

  • Specifically, SIMA 2 reaches around two-thirds task completion in tests, significantly outperforming SIMA 1 and approaching human-level performance.

  • A key innovation is self-improvement: SIMA 2 uses Gemini to generate new tasks and a reward model to evaluate attempts, reducing reliance on human data.

  • SIMA 2 demonstrates stronger generalization across virtual environments, tackles longer tasks, and can interpret high-level goals, explain steps, and collaborate with humans or other agents.

  • Embodied intelligence applications are a focus, with skills in navigation, tool use, and collaborative task execution supporting broader embodied AI research.

  • Pairing SIMA 2 with Genie 3 allows generation and navigation of real-time 3D environments from images or text, enabling goal-directed actions in unseen worlds.

  • The reports highlight that SIMA 2 bridges games and real applications by handling complex interactions and goal pursuit, illustrating the learning potential for real objects and environments.

  • When combined with Genie 3, SIMA 2 can navigate automatically generated 3D worlds from text or images, following user prompts without prior exposure.

  • Self-improvement and iterative learning enable SIMA 2 to generate experience data autonomously after initial demonstrations, aiding future versions.

  • In sum, SIMA 2 expands the range of skills beyond SIMA by adding reasoning and flexible world interaction.

  • Tests show improved generalization across unfamiliar games like ASKA and MineDojo, transferring concepts such as mining between environments.

Summary based on 8 sources


Get a daily email with more Tech stories

More Stories