What problem do stateful MCP client capabilities solve on Amazon Bedrock AgentCore Runtime?

Stateful Model Context Protocol (MCP) client capabilities on Amazon Bedrock AgentCore Runtime address the critical limitations of previous stateless AI agent implementations. Stateless agents struggled with interactive, multi-turn workflows, as they couldn't pause mid-execution to solicit user input for clarification, request dynamic large language model (LLM)-generated content, or provide real-time progress updates during lengthy operations. Each request was independent, lacking shared context. This new feature fundamentally transforms agent interactions by enabling bidirectional conversations, allowing agents to maintain conversational threads, gather necessary input precisely when needed, generate dynamic content on the fly, and transparently inform users about ongoing processes. This leads to the development of significantly more responsive, intelligent, and user-centric AI applications capable of complex, adaptive workflows.

How does the transition from stateless to stateful mode work on AgentCore Runtime?

The transition to stateful mode within Amazon Bedrock AgentCore Runtime is initiated by a simple configuration adjustment: setting `stateless_http=False` when starting your MCP server. Once enabled, AgentCore Runtime provisions a dedicated microVM for each individual user session. This microVM is designed for persistence throughout the session's duration, which can last up to 8 hours or expire after 15 minutes of inactivity, ensuring isolated CPU, memory, and filesystem resources for each session. Continuity across interactions is maintained through a unique `Mcp-Session-Id` header. This ID is established during the initial handshake and subsequently included by the client in all follow-up requests, ensuring they are accurately routed back to the correct, persistent session, thereby preserving context and enabling complex, interactive dialogues.

What is Elicitation, and how does it enhance AI agent interactions?

Elicitation is a powerful stateful MCP capability that allows an AI agent (acting as the MCP server) to intelligently pause its ongoing execution and request specific, structured input directly from the user via the client. This significantly enhances interactive agent workflows by enabling agents to ask targeted questions at precise, opportune moments within their operational flow. For example, an agent might use elicitation to confirm a decision, gather user preferences, or collect particular data values that are contingent on preceding steps. The feature supports two robust modes: 'Form mode' for direct structured data collection through the MCP client, and 'URL mode' for secure, out-of-band interactions that require directing the user to an external URL (e.g., for OAuth or sensitive credential entry). The user's response – whether accepting, declining, or canceling the request – is then returned to the server, allowing the agent to dynamically adapt its workflow based on real-time human feedback.

How does Sampling capability benefit AI agents without managing LLM credentials?

Sampling equips the MCP server with the ability to request sophisticated large language model (LLM)-generated content directly from the client using the `sampling/createMessage` mechanism. A key benefit is that the MCP server itself does not need to manage its own LLM credentials, API keys, or direct integrations with various LLM providers. Instead, the server simply provides a well-formed prompt and any optional model preferences to the client. The client then acts as an intelligent intermediary, forwarding this request to its connected LLM and returning the generated response back to the server. This abstraction allows AI agents to seamlessly leverage powerful language model capabilities for tasks such as crafting personalized summaries, generating natural-language explanations from complex structured data, or producing context-aware recommendations, all while simplifying the operational overhead and security concerns associated with LLM management on the server side.

Amazon Bedrock：AgentCore 运行时上的有状态 MCP 客户端功能

增强 AI 代理：Amazon Bedrock 上的有状态 MCP 转变

AI 代理正在迅速发展，但其全部潜力常常因无状态实现而受阻，尤其是在需要实时用户交互、动态内容生成或持续进度更新的场景中。开发复杂 AI 代理的开发人员经常面临挑战，即工作流在长时间运行的操作中需要暂停、收集澄清或报告状态。无状态执行的僵化、单向性质限制了真正交互式和响应式 AI 应用程序的开发。

现在，Amazon Bedrock AgentCore Runtime 引入了开创性的有状态模型上下文协议 (MCP) 客户端功能，改变了 AI 代理与用户和大型语言模型 (LLM) 的交互方式。这一关键更新使代理摆脱了无状态通信的束缚，实现了复杂、多轮和高度交互式的工作流。通过集成重要的 MCP 客户端功能——启发（Elicitation）、采样（Sampling）和进度通知（Progress Notifications）——Bedrock AgentCore Runtime 促进了 MCP 服务器和客户端之间的双向对话，为更智能、以用户为中心的 AI 解决方案铺平了道路。

从无状态到有状态：解锁交互式代理工作流

此前，AgentCore 上的 MCP 服务器支持以无状态模式运行，其中每个 HTTP 请求独立运行，没有任何共享上下文。虽然这简化了基本工具服务器的部署，但它严重限制了需要会话连续性、工作流中用户澄清或实时进度报告的场景。服务器无法在离散请求之间维护会话线程，从而阻碍了真正交互式代理的开发。

有状态 MCP 客户端功能的出现从根本上改变了这一范式。通过在服务器启动时设置 stateless_http=False，AgentCore Runtime 会为每个用户会话配置一个专用的微型虚拟机 (microVM)。此 microVM 在会话期间持续存在——最长可达 8 小时，或根据 idleRuntimeSessionTimeout 设置在 15 分钟不活动后超时——确保会话之间的 CPU、内存和文件系统隔离。通过 Mcp-Session-Id 标头保持连续性，该标头由服务器在初始化期间提供，客户端在所有后续请求中包含该标头，以路由回同一会话。这种专用、持久的环境允许代理记住上下文、征求用户输入、生成动态 LLM 内容并提供持续更新。

下表总结了无状态模式和有状态模式之间的主要区别：

	Stateless mode	Stateful mode
`stateless_http` setting	`TRUE`	`FALSE`
Session isolation	Dedicated microVM per session	Dedicated microVM per session
Session lifetime	Up to 8 hours; 15-min idle timeout	Up to 8 hours; 15-min idle timeout
Client capabilities	Not supported	Elicitation, sampling, progress notifications
Recommended for	Simple tool serving	Interactive, multi-turn workflows

当会话过期或服务器重新启动时，带有旧会话 ID 的后续请求将返回 404 错误。届时，客户端必须重新初始化连接以获取新的会话 ID 并启动新会话。启用有状态模式的配置更改是服务器启动中的一个单独标志：

mcp.run( transport="streamable-http", host="0.0.0.0", port=8000, stateless_http=False # Enable stateful mode)

除了这个标志，一旦 MCP 客户端在初始化握手期间声明支持它们，这三个客户端功能就会自动可用。

新客户端功能深入解析：启发、采样和进度

随着向有状态模式的转变，Amazon Bedrock AgentCore Runtime 释放了 MCP 规范中的三项强大客户端功能，每项功能都旨在解决对高级 AI 代理至关重要的不同交互模式。这些功能将曾经僵硬的单向命令执行转变为 MCP 服务器与其连接客户端之间流畅的双向对话。值得注意的是，这些功能是可选的，这意味着客户端在初始化期间声明它们的支持，并且服务器只能使用已连接客户端声明的功能。

启发（Elicitation）：在 AI 代理中实现动态用户输入

**启发（Elicitation）**是交互式 AI 的基石，它允许 MCP 服务器明智地暂停其执行，并通过客户端向用户请求特定的结构化输入。此功能使工具能够在工作流中的适当时刻提出精确的问题，无论是确认决策、收集用户偏好，还是收集源自先前操作的值。服务器通过发送 elicitation/create JSON-RPC 请求来启动此过程，该请求包括一个人类可读的消息和一个可选的 requestedSchema，用于描述预期的响应结构。

MCP 规范为启发提供了两种强大的模式：

**表单模式：**这非常适合通过 MCP 客户端直接收集结构化数据，例如配置参数、用户偏好或不涉及敏感数据的简单确认。
**URL 模式：**对于需要安全、带外处理的交互，例如 OAuth 流程、支付处理或敏感凭证输入，URL 模式将用户引导至外部 URL。这确保敏感信息完全绕过 MCP 客户端，从而增强安全性和合规性。

收到启发请求后，客户端会渲染适当的输入界面。用户的后续操作会触发一个三动作响应模型返回给服务器：accept（用户提供了请求的数据）、decline（用户明确拒绝了请求）或 cancel（用户在未做出选择的情况下取消了提示）。智能服务器旨在优雅地处理这些场景中的每一个，确保稳健且用户友好的体验。例如，如源材料所示，一个 add_expense_interactive 工具可以引导用户通过一系列问题——金额、描述、类别和最终确认——然后才将数据提交到 Amazon DynamoDB 等后端。每个步骤都利用 Pydantic 模型来定义预期的输入，FastMCP 会将其无缝转换为 elicitation/create 请求所需的 JSON Schema。

采样（Sampling）和进度通知：提升 LLM 交互和透明度

除了直接用户交互，**采样（Sampling）**使 MCP 服务器能够通过 sampling/createMessage 直接从客户端请求 LLM 生成的内容。这是一个关键机制，因为它允许服务器上的工具逻辑利用强大的语言模型功能，而无需管理自己的 LLM 凭证或直接 API 集成。服务器只需提供一个提示和可选的模型偏好，客户端作为中间人，将请求转发给其连接的 LLM 并返回生成的响应。这开启了无数实际应用，包括制作个性化摘要、从结构化数据生成自然语言解释，或根据正在进行的对话生成上下文感知的推荐。

对于需要长时间运行的操作，**进度通知（Progress Notifications）**变得极其宝贵。此功能允许 MCP 服务器在长时间运行的任务期间报告增量更新。通过使用 ctx.report_progress(progress, total)，服务器可以发出持续更新，客户端可以将其转换为视觉反馈，例如进度条或状态指示器。无论是搜索大量数据源还是执行复杂的计算任务，透明的进度更新都能确保用户随时了解情况，防止沮丧并增强整体用户体验，而不是让他们盯着空白屏幕，疑惑系统是否仍在运行。

使用 Bedrock AgentCore Runtime 助力 AI 代理开发面向未来

Amazon Bedrock AgentCore Runtime 上有状态 MCP 客户端功能的引入代表着 AI 代理开发领域向前迈出了重要一步。通过将以前的无状态交互转变为动态的双向对话，AWS 赋能开发人员构建更智能、响应更迅速、更用户友好的 AI 应用程序。这些功能——用于引导用户输入的启发（Elicitation）、用于按需 LLM 生成的采样（Sampling）以及用于实时透明度的进度通知（Progress Notifications）——共同开启了交互式代理工作流的新时代。随着 AI 的不断发展，这些基础功能对于创建能够无缝集成到复杂业务流程、适应用户需求并提供卓越价值的复杂可操作代理 AI 将至关重要。