1d ago

Perplexity AI deploys Qwen3 235B model on NVIDIA GB200 racks

751.6K142580197.9K

——0——

Perplexity AI published research on a disaggregated prefill-decode architecture for serving its post-trained Qwen3 235B Mixture-of-Experts model on NVIDIA GB200 NVL72 Blackwell racks. Separate GPU nodes handle prefill and decode stages linked by NVLink within nodes and InfiniBand between nodes. The company reports higher inference throughput than the prior Hopper generation at equivalent accuracy and has placed the model into production on the new racks with increased throughput and lower cost.

Original post

AS#91@ARAVSRINIVAS @PERPLEXITY_AI

Perplexity@PERPLEXITY_AI

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

7:17 AM · May 12, 2026

Cluster engagement

5 snapshots

Reposted by

AV#146@ASHVASWANI

AS#91@ARAVSRINIVAS

QUOTE POSTAS #91Aravind Srinivas@ARAVSRINIVAS

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

Perplexity@perplexity_ai

2:17 PM · May 12, 2026 · 136.7K Views

2:27 PM · May 12, 2026 · 28.7K Views

QUOTE POSTB(#620Beff (e/acc)@BEFFJEZOS

Perplexity should keep publishing like this. Good move.

Perplexity@perplexity_ai

2:17 PM · May 12, 2026 · 136.7K Views

9:43 PM · May 12, 2026 · 18.4K Views