- 50% Higher FLOPS vs. previous generation (B200/GB200)
- Enhanced memory capacity (up to 288GB HBM3E) for long-sequence reasoning model training and inference
- Power sloshing innovation for dynamic CPU-GPU power allocation
- Major leap in Reasoning Inference: Faster token generation for LLMs
- Economic Impact: Up to 3x reduction in operational costs for AI workloads
- New Supply Chain Model: Greater freedom for hyperscalers to design and build custom boards
- Key Performance Highlights
- Memory Upgrades:
- 12-Hi HBM3E for 288GB total memory
- Sustained bandwidth of 8TB/s
- Increased Power Envelope:
- TDP up to 1.4KW (GB300) and 1.2KW (B300 HGX)
- Improved architectural enhancements for AI workloads
- NVL72 for Scalable AI
- 72-GPU Clusters via NVLink:
- All-to-all switched connectivity for ultra-low latency
- Ability to share KVCache across all GPUs, enabling extended reasoning chains
- 10x Better Tokenomics: Perfect for long-chain “reasoning” tasks with huge memory footprints
- Supply Chain Shake-Up
- SXM Puck Module: Nvidia now ships GPUs (B300) and CPUs (Grace) as modules instead of full boards
- VRM Procurement Moves: VRM content no longer fully controlled by Nvidia; more suppliers entering the ecosystem
- New Opportunities for OEMs/ODMs: Less locked-in design, more customization potential
- Hyperscaler Reactions
- Amazon: Transitioning from sub-optimal PCIe-based NVL36 to water-cooled NVL72 designs with a custom NIC solution
- Meta, Google, Microsoft: Mixed approaches, with some continuing GB200 in Q4, others jumping to B300 platforms
- Margin & Cost Implications: Customers can optimize component choices, altering Nvidia’s margin structure
Source(s):