Amazon data center Available

AWS Inferentia2

Name: AWS Inferentia2
Brand: Amazon
SKU: aws-inferentia2

Inferentia · Inferentia 2nd Gen Architecture

AWS Inferentia2 is Amazon's second-generation inference chip, delivering 3x the compute performance and 4x the throughput of the original Inferentia. With 32GB HBM per accelerator and 190 TFLOPS of performance, it supports large language models and generative AI workloads. Inferentia2 powers EC2 Inf2 instances and delivers the lowest cost per inference on AWS.

Key Features

2 NeuronCores per chip 32GB HBM per accelerator 190 TFLOPS throughput EC2 Inf2 instances Dynamic batching support

Full Specifications

Compute

Architecture Inferentia2

NeuronCores 2

FP16 Performance 190 TFLOPS

BF16 Performance 190 TFLOPS

Memory

Memory Size 32 GB

Memory Type HBM

Memory Bandwidth 820 GB/s

Power & Physical

Form Factor Custom ASIC

Features & Connectivity

NVLink Support No

Multi-GPU Support Yes

Availability

MSRP (USD) Contact for pricing

Release Date Apr 2023

Status Available

Industries

AI/ML Cloud Computing Retail Finance

Use Cases

LLM Inference Real-time Inference Natural Language Processing Computer Vision Inference Generative AI Serving

Interested in the AWS Inferentia2?

Get pricing, availability, and bulk discount information from our team.

Enquire Now

Related GPUs

Amazon data center

AWS Trainium2

Available View Specs

Amazon data center

AWS Trainium

Memory

512GB HBM (per node)

FP16

190 TFLOPS

Available View Specs

Amazon data center

AWS Inferentia

Memory

8GB DDR4

Available View Specs