AWS Inferentia is Amazon's first-generation custom inference chip, designed to deliver high throughput and low cost for machine learning inference workloads. With 4 NeuronCores and 128 TOPS of INT8 performance, it powers EC2 Inf1 instances and delivers up to 70% lower cost per inference compared to GPU-based instances for supported models.