Amazon data center Available

AWS Inferentia

Inferentia ยท Inferentia 1st Gen Architecture

AWS Inferentia is Amazon's first-generation custom inference chip, designed to deliver high throughput and low cost for machine learning inference workloads. With 4 NeuronCores and 128 TOPS of INT8 performance, it powers EC2 Inf1 instances and delivers up to 70% lower cost per inference compared to GPU-based instances for supported models.

Key Features

4 NeuronCores per chip 128 TOPS INT8 Lowest cost inference on AWS EC2 Inf1 instances AWS Neuron SDK support

Full Specifications

Compute

Architecture Inferentia
NeuronCores 4
INT8 Performance 128 TOPS

Memory

Memory Size 8 GB
Memory Type DDR4

Power & Physical

Form Factor Custom ASIC

Features & Connectivity

NVLink Support No
Multi-GPU Support Yes

Availability

MSRP (USD) Contact for pricing
Release Date Dec 2019
Status Available

Industries

Use Cases

ML Inference Image Recognition Natural Language Processing Recommendation Engines Search Ranking

Interested in the AWS Inferentia?

Get pricing, availability, and bulk discount information from our team.

Enquire Now

Related GPUs

Amazon data center

AWS Trainium2

Available View Specs
Amazon data center

AWS Trainium

Memory

512GB HBM (per node)

FP16

190 TFLOPS

Available View Specs
Amazon data center

AWS Inferentia2

Memory

32GB HBM

FP16

190 TFLOPS

Available View Specs