Code Velocity
Enterprise AI

Meta MTIA Chips Scale AI for Billions

·7 min read·Meta·Original source
Share
Meta MTIA AI chips on a server board in a data center rack

Scaling AI Experiences with Meta's MTIA Chips

Every day, billions of people across Meta's diverse platforms interact with a myriad of AI-powered features, from personalized content recommendations to advanced AI assistants. The underlying challenge for Meta, and indeed the industry, lies in deploying and continuously improving these sophisticated AI models on a global scale, all while maintaining optimal cost-efficiency. This demanding infrastructure task is met by Meta's strategic investment in flexible, continuously evolving solutions, central to which are their custom-designed AI chips: the Meta Training and Inference Accelerator (MTIA) family.

While committed to a diverse silicon portfolio that leverages both internal and external solutions, MTIA chips, developed in close partnership with Broadcom, are an indispensable component of Meta's AI infrastructure strategy. These homegrown accelerators are crucial for cost-effectively powering the AI experiences that reach billions, constantly adapting to the rapidly evolving landscape of AI models.

The Iterative Evolution of Meta's MTIA Chips

The AI model landscape is in a state of perpetual flux, evolving at a pace that often outstrips traditional chip development cycles. Recognizing that chip designs based on projected workloads can become outdated by the time hardware reaches production, Meta has embraced an innovative "velocity strategy" for MTIA. Instead of long, speculative development periods, Meta adopts an iterative approach where each MTIA generation builds upon the last. This involves using modular chiplets, incorporating the latest AI workload insights, and deploying new hardware technologies on a significantly shorter cadence. This tighter feedback loop ensures Meta's custom silicon remains closely aligned with the dynamic demands of AI models, fostering faster adoption of new advancements.

Meta has already detailed the first two generations, MTIA 100 and MTIA 200, in academic papers. Building on this foundation, Meta has accelerated development to introduce four new successive generations: MTIA 300, 400, 450, and 500. These chips are either already in production or slated for mass deployment in 2026 and 2027. This rapid succession has allowed Meta to expand MTIA's workload coverage significantly, moving from initial ranking and recommendation (R&R) inference to R&R training, general Generative AI (GenAI) workloads, and highly optimized GenAI inference.

MTIA 300: Laying the Foundation for AI Workloads

The MTIA 300 marked a pivotal step in Meta's custom silicon journey. Initially optimized for R&R models, which were Meta's dominant workloads before the GenAI boom, its architectural building blocks established a robust foundation for subsequent chips. Key distinguishing features of MTIA 300 include integrated NIC chiplets, dedicated message engines for offloading communication collectives, and near-memory compute capabilities designed for reduction-based collectives. These low-latency, high-bandwidth communication components proved instrumental in enabling efficient GenAI inference and training in the generations that followed.

The MTIA 300 comprises one compute chiplet, two network chiplets, and several High-Bandwidth Memory (HBM) stacks. Each compute chiplet features a grid of processing elements (PEs), strategically designed with redundant PEs to enhance yield. Each PE is a sophisticated unit containing two RISC-V vector cores, a Dot Product Engine for matrix multiplication, a Special Function Unit for activations and elementwise operations, a Reduction Engine for accumulation and inter-PE communication, and a DMA engine for efficient data movement within local scratch memory. This intricate design underscored Meta's commitment to creating a highly efficient and cost-effective solution for its core AI tasks.

MTIA 400: Achieving Competitive GenAI Performance

With the unprecedented surge in Generative AI, Meta rapidly evolved the MTIA 300 into the MTIA 400 to provide robust support for GenAI workloads alongside its existing R&R capabilities. The MTIA 400 represents a significant leap, offering 400% higher FP8 FLOPS and a 51% increase in HBM bandwidth compared to its predecessor. While MTIA 300 focused on cost-effectiveness, MTIA 400 was designed to deliver raw performance competitive with leading commercial AI accelerators.

It achieves this by combining two compute chiplets to effectively double compute density and by supporting enhanced versions of MX8 and MX4, crucial low-precision formats for efficient GenAI inference. A single rack equipped with 72 MTIA 400 devices, interconnected via a switched backplane, forms a powerful scale-up domain. These systems are supported by advanced air-assisted liquid cooling (AALC) racks, facilitating rapid deployment even in legacy data centers, showcasing Meta's practical approach to scaling its AI infrastructure globally.

MTIA 450 and 500: Specialized for GenAI Inference

Anticipating the continued exponential growth in GenAI inference demand, Meta further refined the MTIA 400, leading to the development of MTIA 450 and subsequently MTIA 500. These generations are specifically optimized for the unique challenges of GenAI inference, focusing on critical advancements in memory and compute.

MTIA 450 made significant strides by:

  1. Doubling HBM bandwidth from the prior version, which is crucial for accelerating the decode phase in GenAI models.
  2. Increasing MX4 FLOPS by 75%, speeding up mixture-of-experts (MoE) feed-forward network (FFN) computations common in large language models.
  3. Introducing hardware acceleration to make attention and FFN computations more efficient, alleviating bottlenecks associated with Softmax and FlashAttention.
  4. Innovating in low-precision data types, moving beyond FP8/MX8 to deliver 6x the MX4 FLOPS of FP16/BF16, with custom data-type innovations that preserve model quality and boost FLOPS with minimal chip area impact.

MTIA 500, building on the 450's success, further increased HBM bandwidth by an additional 50% and introduced more innovations in low-precision data types, reinforcing Meta's commitment to pushing the boundaries of GenAI inference performance. This relentless drive for improvement ensures that Meta's AI experiences remain at the cutting edge.

The cumulative advancements across these generations are stark. From MTIA 300 to MTIA 500, the HBM bandwidth has increased by an impressive 4.5x, while the compute FLOPS have seen an astonishing 25x increase (from MTIA 300’s MX8 to MTIA 500’s MX4). This rapid acceleration within two years is a testament to Meta's velocity strategy and its ability to continually enhance its custom silicon. This evolution is central to operationalizing agentic AI and other complex models at scale.

Here's a breakdown of the key specifications across the MTIA family:

FeatureMTIA 300MTIA 400MTIA 450MTIA 500
Compute Die1222
HBM Stacks4488
HBM Bandwidth (GB/s)*100151302453
MX8 FLOPS (TFLOPS)100400400400
MX4 FLOPS (TFLOPS)N/A200350500
Scale-up Domain Size18 devices**72 devices72 devices72 devices
Key OptimizationR&R training, low-latency communicationGeneral GenAI, competitive raw perf.GenAI inference, HBM, custom low-prec.GenAI inference, HBM, custom low-prec.

*Some vendors report bidirectional bandwidth. Multiply the value in the table by two to obtain the corresponding bidirectional bandwidth. **MTIA 300 is configured with a scale-out network with higher bandwidth (200 GB/s) due to its relatively small scale-up domain size and the target R&R workloads.

These specifications highlight the dramatic improvements in memory bandwidth and compute power, demonstrating how each MTIA generation is meticulously engineered to address the most pressing demands of current and future AI applications, particularly the resource-intensive GenAI models.

Meta's relentless pursuit of custom silicon solutions via the MTIA family underscores its commitment to delivering cutting-edge AI experiences to billions of users worldwide. By combining internal innovation with strategic partnerships, Meta continues to redefine the possibilities of scalable and cost-effective AI infrastructure.

Frequently Asked Questions

What are Meta MTIA chips and what is their purpose?
Meta Training and Inference Accelerator (MTIA) chips are custom-designed AI accelerators developed by Meta in partnership with Broadcom. Their primary purpose is to power the vast array of AI-driven experiences across Meta's platforms for billions of users. This includes everything from personalized recommendations (R&R) to advanced Generative AI (GenAI) assistants. By developing its own silicon, Meta aims to cost-effectively scale AI workloads, maintain flexibility, and optimize performance for its specific infrastructure needs, ensuring continuous innovation in AI hardware development.
How many generations of MTIA chips has Meta developed in recent years?
Meta has rapidly accelerated MTIA development, introducing four successive generations in under two years: MTIA 300, MTIA 400, MTIA 450, and MTIA 500. These chips have either already been deployed or are scheduled for mass deployment in 2026 or 2027. This rapid iteration showcases Meta's 'velocity strategy,' designed to keep pace with the extraordinarily fast evolution of AI models and ensure their hardware remains aligned with current and future workload demands, expanding beyond initial R&R tasks to encompass general GenAI and specialized GenAI inference.
What is Meta's 'velocity strategy' for AI chip development?
Meta's 'velocity strategy' is an iterative approach to AI chip development that contrasts with traditional, longer chip design cycles. Recognizing that AI models evolve faster than typical hardware development, Meta designs each MTIA generation to build on the last using modular chiplets. This strategy incorporates the latest AI workload insights and hardware technologies, enabling deployment on a shorter cadence. This tighter feedback loop ensures Meta's custom hardware remains closely aligned with evolving AI models, facilitating faster adoption of new technologies and maintaining optimal performance and cost-efficiency.
How do the newer MTIA chips (400, 450, 500) support Generative AI workloads?
As GenAI surged, MTIA chips evolved significantly to support these demanding workloads. MTIA 400 enhanced support for GenAI with 400% higher FP8 FLOPS and increased HBM bandwidth. MTIA 450 specifically optimized for GenAI inference by doubling HBM bandwidth, increasing MX4 FLOPS by 75%, introducing hardware acceleration for attention and FFN computations, and innovating with custom low-precision data types. MTIA 500 further improved on this, increasing HBM bandwidth by an additional 50% and introducing more low-precision innovations, directly addressing the compute and memory demands of complex GenAI models.
What are the key performance advancements from MTIA 300 to MTIA 500?
The MTIA chip family has seen remarkable advancements from the 300 series to the 500 series in less than two years. The HBM bandwidth has increased by 4.5 times, significantly boosting memory access speed crucial for large AI models. The compute FLOPS (Floating Point Operations Per Second) have seen an astounding 25-fold increase, particularly from MTIA 300's MX8 to MTIA 500's MX4 formats. These dramatic improvements underscore Meta's ability to rapidly enhance its custom silicon's raw processing power and data handling capabilities to meet the escalating demands of advanced AI models.
Why is High-Bandwidth Memory (HBM) important for GenAI inference performance?
High-Bandwidth Memory (HBM) is critically important for Generative AI (GenAI) inference performance because GenAI models, especially large language models (LLMs), typically have massive parameter counts and require extensive memory bandwidth to efficiently retrieve and process these parameters during inference. The decoder step in GenAI inference, which generates tokens sequentially, is often bottlenecked by memory access rather than raw compute. Doubling or significantly increasing HBM bandwidth, as seen in MTIA 450 and 500, directly translates to faster token generation, lower latency, and higher throughput, making the AI experiences more responsive and efficient for users.

Stay Updated

Get the latest AI news delivered to your inbox.

Share