Tenstorrent Unveils Galaxy Blackhole: Industry-Leading AI Cluster with 23 PFLOPS and Sub-4-Second Token Generation

April 28, 2026
Tenstorrent Unveils Galaxy Blackhole: Industry-Leading AI Cluster with 23 PFLOPS and Sub-4-Second Token Generation
  • Tenstorrent unveils Galaxy Blackhole, a high-performance, Ethernet-based AI cluster designed for real-time video generation and large-language-model inference, delivering 23 PFLOPS of Block FP8 compute and sub-4-second token generation for large prompts.

  • Galaxy Blackhole achieves industry-leading speeds, including 720p real-time video generation in seconds and Blitz Mode, which pushes 350+ tokens per second per user with quick time-to-first-token on a 671B model.

  • The Galaxy platform centers on Networked AI, unifying compute, memory, and networking in a single system to scale from a single server to thousands without proprietary interconnects.

  • Core hardware specs feature 6.2 GB on-chip SRAM across 32 chips with about 2.9 PB/s bandwidth, plus 1 TB DRAM and 16 TB/s memory bandwidth, and up to 56 x 800G Ethernet ports per server for scalable expansion.

  • The design emphasizes balanced performance across compute, memory, and networking to sustain large-scale deployments and future model growth.

  • Tenstorrent promotes a flexible, scalable Ethernet-based interconnect and robust software to differentiate at scale, while acknowledging that success hinges on execution and customer adoption.

  • Adoption is expanding among datacenters and providers, with Cirrascale, Equinix, and ai& in Japan as customers, with more details forthcoming at TT-Deploy on May Day.

  • CEO asserts that specialized, disaggregated hardware is misguided, arguing a general-purpose, networked cluster can deliver fast prefill and decode while cutting token costs and infrastructure complexity via Ethernet.

  • Galaxy’s value proposition emphasizes sustained inference throughput and predictable latency over peak FLOPS, aiming for real-world efficiency in large-scale workloads.

  • The architecture prioritizes data placement, on-chip memory bandwidth, and Ethernet-scale-out to enable seamless scaling from a single server to thousands of nodes without vendor-locked interconnects.

  • Tenstorrent positions Networked AI against Nvidia by leveraging Ethernet-based interconnects for scalable multi-system deployments rather than proprietary fabrics.

  • Executive quotes stress simplifying AI infrastructure, allowing enterprises to focus on product differentiation rather than underlying complexity.

Summary based on 10 sources


Get a daily email with more Tech stories

More Stories