Tenstorrent Unveils Galaxy Blackhole: Industry-Leading AI Cluster with 23 PFLOPS and Sub-4-Second Token Generation

April 28, 2026

Tech

Generative AI

Tenstorrent unveils Galaxy Blackhole, a high-performance, Ethernet-based AI cluster designed for real-time video generation and large-language-model inference, delivering 23 PFLOPS of Block FP8 compute and sub-4-second token generation for large prompts.
Galaxy Blackhole achieves industry-leading speeds, including 720p real-time video generation in seconds and Blitz Mode, which pushes 350+ tokens per second per user with quick time-to-first-token on a 671B model.
The Galaxy platform centers on Networked AI, unifying compute, memory, and networking in a single system to scale from a single server to thousands without proprietary interconnects.
Core hardware specs feature 6.2 GB on-chip SRAM across 32 chips with about 2.9 PB/s bandwidth, plus 1 TB DRAM and 16 TB/s memory bandwidth, and up to 56 x 800G Ethernet ports per server for scalable expansion.
The design emphasizes balanced performance across compute, memory, and networking to sustain large-scale deployments and future model growth.
Tenstorrent promotes a flexible, scalable Ethernet-based interconnect and robust software to differentiate at scale, while acknowledging that success hinges on execution and customer adoption.
Adoption is expanding among datacenters and providers, with Cirrascale, Equinix, and ai& in Japan as customers, with more details forthcoming at TT-Deploy on May Day.
CEO asserts that specialized, disaggregated hardware is misguided, arguing a general-purpose, networked cluster can deliver fast prefill and decode while cutting token costs and infrastructure complexity via Ethernet.
Galaxy’s value proposition emphasizes sustained inference throughput and predictable latency over peak FLOPS, aiming for real-world efficiency in large-scale workloads.
The architecture prioritizes data placement, on-chip memory bandwidth, and Ethernet-scale-out to enable seamless scaling from a single server to thousands of nodes without vendor-locked interconnects.
Tenstorrent positions Networked AI against Nvidia by leveraging Ethernet-based interconnects for scalable multi-system deployments rather than proprietary fabrics.
Executive quotes stress simplifying AI infrastructure, allowing enterprises to focus on product differentiation rather than underlying complexity.

Summary based on 10 sources

Get a daily email with more Tech stories

Sources

USA Today • Apr 28, 2026

Tenstorrent Enables AI At Scale with Industry-Leading Performance Deployed on Novel Networked AI Architecture

Forbes • Apr 28, 2026

Tenstorrent Unveils Galaxy AI Platform Targeting Scale And Efficiency

The National Law Review • Apr 28, 2026

Tenstorrent Enables AI At Scale with Industry-Leading Performance Deployed on Novel Networked AI Architecture

EE Times • Apr 28, 2026

Tenstorrent Unveils Next-Gen Servers for Fast Tokens, No Disaggregation Needed

Tenstorrent Unveils Galaxy Blackhole: Industry-Leading AI Cluster with 23 PFLOPS and Sub-4-Second Token Generation

Get a daily email with more Tech stories

Sources

More Stories