Code Velocity
Enterprise AI

AWS, NVIDIA Deepen AI Collaboration to Accelerate Production

·5 min read·AWS, NVIDIA·Original source
Share
AWS and NVIDIA logos prominently displayed, symbolizing their expanded strategic collaboration for AI acceleration and innovation.

AWS, NVIDIA Deepen AI Collaboration to Accelerate Production from Pilot to Production

AI is transforming industries at an unprecedented pace, but the true value lies not just in experimentation, but in successfully deploying and operating AI solutions in production environments. This demands robust, scalable, secure, and compliant systems that deliver tangible business outcomes. Addressing this critical need, AWS and NVIDIA announced a significant expansion of their strategic collaboration at NVIDIA GTC 2026, unveiling new technology integrations designed to meet the escalating demand for AI compute and propel AI solutions into real-world production.

The deepened partnership focuses on accelerating every facet of the AI lifecycle, from infrastructure to model deployment. These integrations span crucial areas including accelerated computing, advanced interconnect technologies, and streamlined model fine-tuning and inference. Key announcements include:

  • The deployment of more than 1 million NVIDIA GPUs across AWS Regions starting in 2026.
  • Amazon EC2 support for NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, making AWS the first major cloud provider to offer this.
  • Interconnect acceleration for disaggregated Large Language Model (LLM) inference leveraging NVIDIA NIXL on AWS Elastic Fabric Adapter (EFA).
  • A dramatic 3x faster performance for Apache Spark workloads using Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS) with Amazon EC2 G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.
  • Expanded NVIDIA Nemotron model support on Amazon Bedrock, including Reinforcement Fine-Tuning and the Nemotron 3 Super model.

Scaling AI Infrastructure with Enhanced NVIDIA GPU Power

The foundation of modern AI lies in powerful compute infrastructure. Starting in 2026, AWS is making a monumental commitment to AI advancement by adding over 1 million NVIDIA GPUs to its global cloud regions. This includes next-generation Blackwell and Rubin GPU architectures, ensuring that customers have access to the most advanced hardware available. AWS already boasts the industry's broadest collection of NVIDIA GPU-based instances, catering to a diverse array of AI/ML workloads, and this expansion further solidifies its leadership.

This long-standing partnership, spanning over 15 years, also extends to crucial infrastructure areas like Spectrum networking. The aim is to provide enterprises, startups, and researchers with the robust infrastructure required to build and scale advanced Agentic AI systems—AI capable of autonomous reasoning, planning, and action across complex workflows.

Introducing New Amazon EC2 Instances and Interconnect Innovations

A highlight of the collaboration is the forthcoming Amazon EC2 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. AWS is proud to be the first major cloud provider to announce support for these powerful GPUs, making them accessible for a wide range of demanding tasks. These instances are ideally suited for data analytics, sophisticated conversational AI, dynamic content generation, advanced recommender systems, high-quality video streaming, and complex graphics workloads.

These new EC2 instances will be built on the robust AWS Nitro System. The Nitro System, with its unique combination of dedicated hardware and a lightweight hypervisor, delivers nearly all of the host hardware's compute and memory resources directly to instances. This design ensures superior resource utilization and performance. Crucially, the Nitro System's specialized hardware, software, and firmware are engineered to enforce stringent restrictions, safeguarding sensitive AI workloads and data from unauthorized access, even from within AWS. Its ability to perform firmware updates and optimizations while operational further enhances the security and stability essential for production-grade AI, analytics, and graphics workloads.

Further enhancing performance, particularly for massive AI models, is the acceleration of interconnects for disaggregated LLM inference. As model sizes continue to grow, communication overhead between GPUs or AWS Trainium instances can become a significant bottleneck. AWS announced support for NVIDIA Inference Xfer Library (NIXL) with AWS Elastic Fabric Adapter (EFA), designed to accelerate disaggregated LLM inference on Amazon EC2, spanning both NVIDIA GPUs and AWS Trainium. This integration is vital for scaling modern AI workloads, enabling efficient overlap of communication and computation, minimizing latency, and maximizing GPU utilization. It facilitates high-throughput, low-latency KV-cache data movement between compute nodes and distributed memory resources. NIXL with EFA integrates natively with popular open-source frameworks such as NVIDIA Dynamo, vLLM, and SGLang, delivering improved inter-token latency and more efficient KV-cache memory utilization.

Accelerating Data Analytics with Amazon EMR and GPUs

Data engineers and scientists frequently grapple with lengthy data processing pipelines that can significantly hinder AI/ML model iteration and business intelligence generation. The AWS and NVIDIA collaboration delivers a groundbreaking improvement: 3x faster performance for Apache Spark workloads. This acceleration is achieved by leveraging Amazon EMR on Amazon EKS with G7e instances, powered by NVIDIA's RTX PRO 6000 Blackwell Server Edition GPUs.

This substantial performance gain is a direct result of joint engineering efforts focused on optimizing GPU-accelerated analytics. With Amazon EMR and G7e instances, organizations can dramatically reduce the time needed for AI/ML feature engineering, complex ETL transformations, and real-time analytics at scale. Customers running large-scale data processing pipelines can achieve faster time-to-insight while maintaining full compatibility with their existing Spark applications.

Expanding NVIDIA Nemotron Model Support on Amazon Bedrock

AWS and NVIDIA are also expanding their collaboration on foundational models, bringing advanced NVIDIA Nemotron models to Amazon Bedrock.

Developers will soon have the capability to fine-tune NVIDIA Nemotron models directly on Amazon Bedrock using Reinforcement Fine-Tuning (RFT). This is a game-changer for teams that need to tailor model behavior to specific domains, whether in legal, healthcare, finance, or other specialized fields. RFT empowers users to shape how a model reasons and responds, moving beyond mere knowledge acquisition to nuanced behavioral alignment. Crucially, this runs natively on Amazon Bedrock, eliminating infrastructure overhead – users define the task, provide feedback, and Bedrock manages the rest.

Furthermore, NVIDIA Nemotron 3 Super, a hybrid Mixture-of-Experts (MoE) model built for multi-agent workloads and extended reasoning, is also coming soon to Amazon Bedrock. Engineered to help AI agents maintain accuracy across complex, multi-step workflows, Nemotron 3 Super will power diverse use cases spanning finance, cybersecurity, retail, and software development. It promises fast, cost-efficient inference through a fully managed API, simplifying the deployment of sophisticated AI agents.

Here’s a summary of the key announcements:

Feature/IntegrationDescriptionPrimary BenefitAvailability
GPU DeploymentOver 1 million NVIDIA GPUs (Blackwell, Rubin architectures) across AWS Regions.Massive compute scale for all AI/ML workloads, agentic AI.Starting 2026
Amazon EC2 InstancesSupport for NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs on EC2.First major cloud provider support for versatile AI, graphics, analytics.Coming soon
LLM InferenceNVIDIA NIXL on AWS EFA for accelerated disaggregated LLM inference across GPUs and Trainiums.Minimized communication latency, maximized GPU utilization for LLMs.Announced
Apache Spark Performance3x faster Spark workloads on Amazon EMR on EKS with G7e instances (RTX PRO 6000).Accelerated time-to-insight for data analytics, feature engineering.Announced
Nemotron Fine-TuningReinforcement Fine-Tuning (RFT) for Nemotron models directly on Amazon Bedrock.Domain-specific model behavior alignment without infrastructure overhead.Coming soon
Nemotron 3 SuperHybrid MoE model for multi-agent workloads and extended reasoning on Amazon Bedrock.Fast, cost-efficient inference for complex, multi-step AI tasks.Coming soon

Commitment to Energy Efficiency and Sustainable AI

As AI workloads continue to grow exponentially, the efficiency and sustainability of the underlying infrastructure become paramount. The collaboration also highlights a shared commitment to improving energy efficiency. Performance per watt is no longer just a sustainability metric but a significant competitive advantage in the AI landscape.

At NVIDIA GTC 2026, Amazon CSO Kara Hurst joined other sustainability leaders to discuss how AI is fundamentally transforming enterprise energy and infrastructure at scale. This discussion underscores the focus on developing and deploying AI solutions that are not only powerful but also environmentally responsible, from data centers optimized as active grid participants to broader enterprise AI applications. This forward-thinking approach ensures that the advancements in AI compute are aligned with global sustainability goals.

Frequently Asked Questions

What is the primary goal of the expanded strategic collaboration between AWS and NVIDIA?
The collaboration aims to accelerate the transition of AI solutions from experimental phases to full-scale production environments. This involves integrating new technologies and expanding existing capabilities across accelerated computing, interconnect technologies, model fine-tuning, and inference. The focus is on enabling customers to build and run AI solutions that are reliable, performant at scale, and compliant with enterprise security and regulatory requirements, ultimately driving meaningful business outcomes through production-ready AI systems.
What significant GPU infrastructure expansions are planned by AWS as part of this collaboration?
Starting in 2026, AWS plans to deploy over 1 million NVIDIA GPUs, including the next-generation Blackwell and Rubin architectures, across its global cloud regions. This massive expansion solidifies AWS's position as a leading provider of NVIDIA GPU-based instances, offering the broadest collection for diverse AI/ML workloads. This enhanced capacity is crucial for supporting the surging demand for AI compute, particularly for complex agentic AI systems that require extensive computational power.
How will the new Amazon EC2 instances with NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs benefit users?
AWS is the first major cloud provider to support the NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs on Amazon EC2 instances. These instances are highly versatile, suitable for a broad spectrum of workloads such as data analytics, conversational AI, content generation, recommender systems, video streaming, and advanced graphics rendering. Built on the AWS Nitro System, they offer enhanced resource efficiency, robust security, and stability, delivering superior performance for demanding AI and graphics applications.
How does the integration of NVIDIA NIXL with AWS EFA enhance Large Language Model (LLM) inference?
The integration of NVIDIA Inference Xfer Library (NIXL) with AWS Elastic Fabric Adapter (EFA) is designed to accelerate disaggregated LLM inference on Amazon EC2 across both NVIDIA GPUs and AWS Trainium instances. This is critical for managing the communication overhead in large models, enabling efficient overlap of communication and computation, minimizing latency, and maximizing GPU utilization. It facilitates high-throughput, low-latency KV-cache data movement and integrates natively with popular open-source frameworks like NVIDIA Dynamo, vLLM, and SGLang.
What improvements are being made to Apache Spark performance for data analytics?
AWS and NVIDIA's joint engineering efforts have resulted in a 3x faster performance for Apache Spark workloads. This is achieved by combining Amazon EMR on Amazon EKS with G7e instances, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. This significant speedup allows data engineers and scientists to accelerate time-to-insight for critical tasks such as AI/ML feature engineering, complex ETL transformations, and real-time analytics, maintaining full compatibility with existing Spark applications.
What expanded NVIDIA Nemotron model support is coming to Amazon Bedrock?
Amazon Bedrock will soon support fine-tuning NVIDIA Nemotron models directly using Reinforcement Fine-Tuning (RFT). This capability allows developers to precisely align model behavior to specific domains like legal, healthcare, or finance without infrastructure overhead. Additionally, NVIDIA Nemotron 3 Super, a hybrid Mixture-of-Experts (MoE) model optimized for multi-agent workloads and extended reasoning, will also be available on Amazon Bedrock, providing fast, cost-efficient inference via a fully managed API for complex, multi-step AI tasks.
How does this collaboration address energy efficiency and sustainability in AI?
The collaboration acknowledges the growing importance of energy efficiency as AI workloads scale. Performance per watt is highlighted not just as a sustainability metric but as a competitive advantage. The article points to an NVIDIA GTC session where sustainability leaders, including Amazon CSO Kara Hurst, discuss how AI is transforming enterprise energy and infrastructure, emphasizing efforts towards more sustainable AI practices from data centers to broader enterprise AI applications.

Stay Updated

Get the latest AI news delivered to your inbox.

Share