AWS Inferentia2 is Amazon's second-generation inference chip, delivering 3x the compute performance and 4x the throughput of the original Inferentia. With 32GB HBM per accelerator and 190 TFLOPS of performance, it supports large language models and generative AI workloads. Inferentia2 powers EC2 Inf2 instances and delivers the lowest cost per inference on AWS.