SageMaker AI：通过无服务器定制加速智能体工具调用

{
  "prompt": [
    {"role": "system", "content": "您是一个乐于助人的助手。当使用工具时，请回应：[...]"},
    {"role": "user", "content": "获取天气"}
  ],
  "reward_model": {
    "ground_truth": "为了提供天气信息，请您说明地点？"
  }
}

使用 SageMaker AI 微调 Qwen 2.5 7B Instruct

在 Amazon SageMaker AI Studio 中微调像 Qwen 2.5 7B Instruct 这样的模型的过程是简化且直观的。在确保满足必要的先决条件（AWS 账户、IAM 角色、SageMaker AI 域、S3 存储桶）后，用户可以导航到 SageMaker AI Studio 中的模型部分。

然后，选择 Qwen 2.5 7B Instruct 并选择 通过 UI 定制 将打开一个专用的配置页面。此界面允许：

技术选择： 从下拉菜单中明确选择“可验证奖励强化学习 (RLVR)”。
数据输入： 指向存储在 Amazon S3 存储桶中的已准备好的训练数据。
奖励函数： 配置分层评分机制，该机制定义如何根据“基准真相”评估候选响应。
超参数配置： 调整批次大小等参数，尽管 SageMaker AI 通常会自动处理最佳设置。

SageMaker AI 支持多种模型系列，包括 Amazon Nova、GPT-OSS、Llama、Qwen 和 DeepSeek，以及各种技术，如监督微调 (SFT)、直接偏好优化 (DPO)、RLVR 和基于 AI 反馈的强化学习 (RLAIF)。集成的 MLflow 跟踪提供了训练和验证指标的可见性，简化了性能监控和迭代。这种易用性极大地加速了开发人员构建复杂“github 智能体工作流”的开发生命周期。

评估与部署成功

我们对微调后的 Qwen 2.5 7B Instruct 模型的有效性进行了严格评估，评估使用了保留数据，包括包含完全“未见过工具”的场景——这是泛化能力的 Crucial 测试。结果令人信服：与基础模型相比，微调后的模型在工具调用奖励方面取得了显著的 57% 提升。在训练期间未曾遇到的场景中，性能的这一显著飞跃凸显了 RLVR 在教授模型强大的工具交互决策能力方面的强大力量。

这种增强的可靠性直接转化为在生产环境中部署“AI 智能体”的更高信任和信心。通过最大程度地减少工具幻觉、参数不正确和不当操作的发生，企业可以利用 AI 智能体执行更关键和敏感的任务。借助 SageMaker AI 处理“模型部署”和“基础设施管理”的复杂性，开发人员可以无缝地从微调过渡到生产，从而充分发挥其智能体 AI 解决方案的潜力。此功能与将智能体 AI 投入运营以实现实际影响的更广阔愿景保持一致。

总而言之，Amazon SageMaker AI 的无服务器模型定制与RLVR强大的学习能力相结合，为构建高度可靠的智能体工具调用系统提供了强大的途径。这种创新方法加速了开发，减少了运营负担，并最终交付了具有前所未有准确性和可信赖性的 AI 智能体。

常见问题

What is agentic tool calling and why is it crucial for AI agents?

Agentic tool calling is the mechanism that empowers AI agents to perform real-world actions like querying databases, initiating workflows, fetching real-time information, and executing tasks on a user's behalf. It's crucial because it bridges the gap between language understanding and practical application, allowing AI agents to move beyond just generating text to actually interacting with external systems and data sources, thereby making them genuinely useful in production environments.

What are the common challenges AI agents face when performing tool calls?

AI agents frequently encounter challenges such as hallucinating tools that don't exist, passing incorrect parameters to valid tools, or attempting actions when they should instead seek clarification from the user. These failures lead to unreliable agent behavior, eroding user trust and posing significant hurdles to the successful deployment of AI agents in critical production systems, ultimately limiting their real-world utility.

How does Amazon SageMaker AI address the challenges of agentic tool calling?

Amazon SageMaker AI addresses these challenges through its serverless model customization capabilities, particularly using Reinforcement Learning with Verifiable Rewards (RLVR). This approach allows developers to fine-tune large language models (LLMs) to improve their tool-calling accuracy without managing complex infrastructure. SageMaker AI handles the operational overhead of GPU provisioning, memory management, and reward infrastructure, letting users focus on data, reward functions, and model behavior.

What is Reinforcement Learning with Verifiable Rewards (RLVR) and how does it work?

RLVR is a powerful fine-tuning technique where the model generates multiple candidate responses for a given prompt. A predefined reward function then evaluates these candidates, providing a signal about their quality and correctness. The model subsequently updates its internal policy to favor responses that received higher reward scores, using methods like Group Relative Policy Optimization (GRPO), thereby iteratively learning to produce more accurate and desired outputs for specific tasks like tool calling.

Why is RLVR considered more effective than Supervised Fine-Tuning (SFT) for tool calling tasks?

While SFT requires meticulously labeled examples for every desired behavior (e.g., calling a tool, clarifying, refusing), RLVR operates differently. SFT can struggle to generalize decision-making between these behaviors. RLVR, by contrast, allows the model to learn the optimal decision boundary by generating multiple candidates and receiving immediate feedback via a reward function, enabling it to better understand *when* to execute a tool call versus *when* to ask for more information or refuse a request.

How is training data prepared for RLVR in Amazon SageMaker AI for agentic tool calling?

Training data for RLVR in SageMaker AI is prepared as JSONL files, where each entry contains a prompt (system and user messages) and a `ground_truth` within a `reward_model` field. This `ground_truth` is what the reward function scores against. To ensure robust agent behavior, datasets are typically designed to cover three distinct scenarios: executing a tool call when all parameters are present, clarifying when information is missing, and refusing requests that are out of scope or harmful. Synthetic data generation tools like Kiro can be used for this purpose.

What agent behaviors are critical for building robust and reliable tool-calling AI agents?

Building robust tool-calling AI agents requires them to master three critical behaviors. First, they must `Execute` a tool call accurately when all necessary information is provided by the user. Second, they need to `Clarify` by asking follow-up questions when essential parameters are missing from a user's request. Third, they must `Refuse` gracefully when a request is out of scope, harmful, or cannot be fulfilled. Training models across these behaviors ensures comprehensive and trustworthy agent performance.

What prerequisites are needed to use serverless model customization in SageMaker AI?

To leverage serverless model customization in Amazon SageMaker AI, users must have an active AWS account, an AWS IAM role configured with the necessary permissions for SageMaker, a SageMaker AI domain providing Studio access for development, and an Amazon Simple Storage Service (Amazon S3) bucket to store training data and model outputs securely. These components ensure a secure and functional environment for fine-tuning models.

SageMaker AI：通过无服务器定制加速智能体工具调用

使用 SageMaker AI 微调 Qwen 2.5 7B Instruct

评估与部署成功

常见问题

保持更新