Perplexity Unveils Breakthrough for Trillion-Parameter AI Models on AWS, Challenging NVIDIA's Dominance

November 4, 2025
Perplexity Unveils Breakthrough for Trillion-Parameter AI Models on AWS, Challenging NVIDIA's Dominance
  • Perplexity announces a breakthrough that lets trillion-parameter AI models run on widely available cloud platforms like AWS, expanding access beyond specialized labs and challenging NVIDIA-dominated infrastructure.

  • The project open-sources a portable, high-performance system and optimized kernels that enable running trillion-parameter models on standard AWS cloud infrastructure for the first time.

  • The approach uses a hybrid CPU-GPU architecture with a TransferEngine and a host CPU thread proxy to manage network transfers, mitigating constraints associated with Elastic Fabric Adapter.

  • The system unifies AWS EFA networking with other backends, achieving state-of-the-art performance on a 64-GPU cluster and showing sub-millisecond decode latency in benchmarks.

  • Kimi-K2 is so large it requires multi-node deployment and cannot fit on a single 8x H200 node, underscoring cloud-scale serving breakthroughs.

  • Perplexity is pursuing licensing with Getty Images for attribution, signaling a mixed openness and monetization approach while advocating responsible AI data use.

  • The core innovation is the TransferEngine, a portable RDMA-based library that enables hardware-agnostic deployment of MoE models by decoupling from NVIDIA GPUDirect Async.

  • The development could reshape large-scale AI deployment economics by offering a hardware-agnostic alternative to NVIDIA-dominated ecosystems on major cloud platforms.

  • New inter-machine kernels optimize communication, delivering practical deployment speeds for trillion-parameter models like Kimi-K2 without requiring specialized NVIDIA hardware.

  • A key challenge overcome is enabling viable performance on AWS Elastic Fabric Adapter, expanding trillion-parameter work beyond restricted environments.

  • Benchmarks show strong performance, with competitive results on NVIDIA ConnectX-7 and practical deployment of 671B DeepSeek-V3 and 1T Kimi-K2 on AWS p5en instances with H200 GPUs.

  • Perplexity continues collaborating with AWS to optimize performance and portability across cloud platforms, aiming for broader, scalable deployment and reduced latency.

Summary based on 3 sources


Get a daily email with more AI stories

More Stories