Perplexity Unveils Breakthrough for Trillion-Parameter AI Models on AWS, Challenging NVIDIA's Dominance
November 4, 2025
Perplexity announces a breakthrough that lets trillion-parameter AI models run on widely available cloud platforms like AWS, expanding access beyond specialized labs and challenging NVIDIA-dominated infrastructure.
The project open-sources a portable, high-performance system and optimized kernels that enable running trillion-parameter models on standard AWS cloud infrastructure for the first time.
The approach uses a hybrid CPU-GPU architecture with a TransferEngine and a host CPU thread proxy to manage network transfers, mitigating constraints associated with Elastic Fabric Adapter.
The system unifies AWS EFA networking with other backends, achieving state-of-the-art performance on a 64-GPU cluster and showing sub-millisecond decode latency in benchmarks.
Kimi-K2 is so large it requires multi-node deployment and cannot fit on a single 8x H200 node, underscoring cloud-scale serving breakthroughs.
Perplexity is pursuing licensing with Getty Images for attribution, signaling a mixed openness and monetization approach while advocating responsible AI data use.
The core innovation is the TransferEngine, a portable RDMA-based library that enables hardware-agnostic deployment of MoE models by decoupling from NVIDIA GPUDirect Async.
The development could reshape large-scale AI deployment economics by offering a hardware-agnostic alternative to NVIDIA-dominated ecosystems on major cloud platforms.
New inter-machine kernels optimize communication, delivering practical deployment speeds for trillion-parameter models like Kimi-K2 without requiring specialized NVIDIA hardware.
A key challenge overcome is enabling viable performance on AWS Elastic Fabric Adapter, expanding trillion-parameter work beyond restricted environments.
Benchmarks show strong performance, with competitive results on NVIDIA ConnectX-7 and practical deployment of 671B DeepSeek-V3 and 1T Kimi-K2 on AWS p5en instances with H200 GPUs.
Perplexity continues collaborating with AWS to optimize performance and portability across cloud platforms, aiming for broader, scalable deployment and reduced latency.
Summary based on 3 sources
Get a daily email with more AI stories
Sources

StartupHub.ai • Nov 4, 2025
Perplexity Cracks Code for Trillion Parameter Models on AWS
Quantum Zeitgeist • Nov 4, 2025
Perplexity Unlocks Trillion-Parameter AI On Cloud Platforms