Scaling AI Experiences with Meta's MTIA Chips
Every day, billions of people across Meta's diverse platforms interact with a myriad of AI-powered features, from personalized content recommendations to advanced AI assistants. The underlying challenge for Meta, and indeed the industry, lies in deploying and continuously improving these sophisticated AI models on a global scale, all while maintaining optimal cost-efficiency. This demanding infrastructure task is met by Meta's strategic investment in flexible, continuously evolving solutions, central to which are their custom-designed AI chips: the Meta Training and Inference Accelerator (MTIA) family.
While committed to a diverse silicon portfolio that leverages both internal and external solutions, MTIA chips, developed in close partnership with Broadcom, are an indispensable component of Meta's AI infrastructure strategy. These homegrown accelerators are crucial for cost-effectively powering the AI experiences that reach billions, constantly adapting to the rapidly evolving landscape of AI models.
The Iterative Evolution of Meta's MTIA Chips
The AI model landscape is in a state of perpetual flux, evolving at a pace that often outstrips traditional chip development cycles. Recognizing that chip designs based on projected workloads can become outdated by the time hardware reaches production, Meta has embraced an innovative "velocity strategy" for MTIA. Instead of long, speculative development periods, Meta adopts an iterative approach where each MTIA generation builds upon the last. This involves using modular chiplets, incorporating the latest AI workload insights, and deploying new hardware technologies on a significantly shorter cadence. This tighter feedback loop ensures Meta's custom silicon remains closely aligned with the dynamic demands of AI models, fostering faster adoption of new advancements.
Meta has already detailed the first two generations, MTIA 100 and MTIA 200, in academic papers. Building on this foundation, Meta has accelerated development to introduce four new successive generations: MTIA 300, 400, 450, and 500. These chips are either already in production or slated for mass deployment in 2026 and 2027. This rapid succession has allowed Meta to expand MTIA's workload coverage significantly, moving from initial ranking and recommendation (R&R) inference to R&R training, general Generative AI (GenAI) workloads, and highly optimized GenAI inference.
MTIA 300: Laying the Foundation for AI Workloads
The MTIA 300 marked a pivotal step in Meta's custom silicon journey. Initially optimized for R&R models, which were Meta's dominant workloads before the GenAI boom, its architectural building blocks established a robust foundation for subsequent chips. Key distinguishing features of MTIA 300 include integrated NIC chiplets, dedicated message engines for offloading communication collectives, and near-memory compute capabilities designed for reduction-based collectives. These low-latency, high-bandwidth communication components proved instrumental in enabling efficient GenAI inference and training in the generations that followed.
The MTIA 300 comprises one compute chiplet, two network chiplets, and several High-Bandwidth Memory (HBM) stacks. Each compute chiplet features a grid of processing elements (PEs), strategically designed with redundant PEs to enhance yield. Each PE is a sophisticated unit containing two RISC-V vector cores, a Dot Product Engine for matrix multiplication, a Special Function Unit for activations and elementwise operations, a Reduction Engine for accumulation and inter-PE communication, and a DMA engine for efficient data movement within local scratch memory. This intricate design underscored Meta's commitment to creating a highly efficient and cost-effective solution for its core AI tasks.
MTIA 400: Achieving Competitive GenAI Performance
With the unprecedented surge in Generative AI, Meta rapidly evolved the MTIA 300 into the MTIA 400 to provide robust support for GenAI workloads alongside its existing R&R capabilities. The MTIA 400 represents a significant leap, offering 400% higher FP8 FLOPS and a 51% increase in HBM bandwidth compared to its predecessor. While MTIA 300 focused on cost-effectiveness, MTIA 400 was designed to deliver raw performance competitive with leading commercial AI accelerators.
It achieves this by combining two compute chiplets to effectively double compute density and by supporting enhanced versions of MX8 and MX4, crucial low-precision formats for efficient GenAI inference. A single rack equipped with 72 MTIA 400 devices, interconnected via a switched backplane, forms a powerful scale-up domain. These systems are supported by advanced air-assisted liquid cooling (AALC) racks, facilitating rapid deployment even in legacy data centers, showcasing Meta's practical approach to scaling its AI infrastructure globally.
MTIA 450 and 500: Specialized for GenAI Inference
Anticipating the continued exponential growth in GenAI inference demand, Meta further refined the MTIA 400, leading to the development of MTIA 450 and subsequently MTIA 500. These generations are specifically optimized for the unique challenges of GenAI inference, focusing on critical advancements in memory and compute.
MTIA 450 made significant strides by:
- Doubling HBM bandwidth from the prior version, which is crucial for accelerating the decode phase in GenAI models.
- Increasing MX4 FLOPS by 75%, speeding up mixture-of-experts (MoE) feed-forward network (FFN) computations common in large language models.
- Introducing hardware acceleration to make attention and FFN computations more efficient, alleviating bottlenecks associated with Softmax and FlashAttention.
- Innovating in low-precision data types, moving beyond FP8/MX8 to deliver 6x the MX4 FLOPS of FP16/BF16, with custom data-type innovations that preserve model quality and boost FLOPS with minimal chip area impact.
MTIA 500, building on the 450's success, further increased HBM bandwidth by an additional 50% and introduced more innovations in low-precision data types, reinforcing Meta's commitment to pushing the boundaries of GenAI inference performance. This relentless drive for improvement ensures that Meta's AI experiences remain at the cutting edge.
The cumulative advancements across these generations are stark. From MTIA 300 to MTIA 500, the HBM bandwidth has increased by an impressive 4.5x, while the compute FLOPS have seen an astonishing 25x increase (from MTIA 300’s MX8 to MTIA 500’s MX4). This rapid acceleration within two years is a testament to Meta's velocity strategy and its ability to continually enhance its custom silicon. This evolution is central to operationalizing agentic AI and other complex models at scale.
Here's a breakdown of the key specifications across the MTIA family:
| Feature | MTIA 300 | MTIA 400 | MTIA 450 | MTIA 500 |
|---|---|---|---|---|
| Compute Die | 1 | 2 | 2 | 2 |
| HBM Stacks | 4 | 4 | 8 | 8 |
| HBM Bandwidth (GB/s)* | 100 | 151 | 302 | 453 |
| MX8 FLOPS (TFLOPS) | 100 | 400 | 400 | 400 |
| MX4 FLOPS (TFLOPS) | N/A | 200 | 350 | 500 |
| Scale-up Domain Size | 18 devices** | 72 devices | 72 devices | 72 devices |
| Key Optimization | R&R training, low-latency communication | General GenAI, competitive raw perf. | GenAI inference, HBM, custom low-prec. | GenAI inference, HBM, custom low-prec. |
*Some vendors report bidirectional bandwidth. Multiply the value in the table by two to obtain the corresponding bidirectional bandwidth. **MTIA 300 is configured with a scale-out network with higher bandwidth (200 GB/s) due to its relatively small scale-up domain size and the target R&R workloads.
These specifications highlight the dramatic improvements in memory bandwidth and compute power, demonstrating how each MTIA generation is meticulously engineered to address the most pressing demands of current and future AI applications, particularly the resource-intensive GenAI models.
Meta's relentless pursuit of custom silicon solutions via the MTIA family underscores its commitment to delivering cutting-edge AI experiences to billions of users worldwide. By combining internal innovation with strategic partnerships, Meta continues to redefine the possibilities of scalable and cost-effective AI infrastructure.
Frequently Asked Questions
What are Meta MTIA chips and what is their purpose?
How many generations of MTIA chips has Meta developed in recent years?
What is Meta's 'velocity strategy' for AI chip development?
How do the newer MTIA chips (400, 450, 500) support Generative AI workloads?
What are the key performance advancements from MTIA 300 to MTIA 500?
Why is High-Bandwidth Memory (HBM) important for GenAI inference performance?
Stay Updated
Get the latest AI news delivered to your inbox.
