The Google TPU v4 is a high-performance training accelerator that powered many of Google's foundational AI models. With 32GB HBM and 275 TFLOPS BF16 per chip, TPU v4 pods scale up to 4096 chips using optical circuit switching for flexible network topologies, enabling efficient distributed training at massive scale.