SageMaker AI: 서버리스 맞춤화를 통한 에이전트 도구 호출 가속화

{
  "prompt": [
    {"role": "system", "content": "You are a helpful assistant. When using tools, respond with: [...]"},
    {"role": "user", "content": "Get the weather"}
  ],
  "reward_model": {
    "ground_truth": "To provide you with the weather information, could you please specify the location?"
  }
}

SageMaker AI로 Qwen 2.5 7B Instruct 미세 조정

Amazon SageMaker AI Studio 내에서 Qwen 2.5 7B Instruct와 같은 모델을 미세 조정하는 과정은 간소하고 직관적입니다. 필요한 전제 조건(AWS 계정, IAM 역할, SageMaker AI 도메인, S3 버킷)이 충족되면, 사용자는 SageMaker AI Studio에서 모델(Models) 섹션으로 이동할 수 있습니다.

여기서 Qwen 2.5 7B Instruct를 선택하고 **UI로 맞춤화(Customize with UI)**를 선택하면 전용 구성 페이지가 열립니다. 이 인터페이스에서는 다음을 설정할 수 있습니다:

기술 선택: 드롭다운에서 검증 가능한 보상을 통한 강화 학습(RLVR)을 명시적으로 선택합니다.
데이터 입력: Amazon S3 버킷에 저장된 준비된 훈련 데이터를 지정합니다.
보상 함수: 후보 응답이 ground_truth에 대해 어떻게 평가되는지를 정의하는 계층적 점수 매기기 메커니즘을 구성합니다.
하이퍼파라미터 구성: 배치 크기와 같은 매개변수를 조정하지만, SageMaker AI는 종종 최적의 설정을 자동으로 처리합니다.

SageMaker AI는 Amazon Nova, GPT-OSS, Llama, Qwen, DeepSeek를 포함한 다양한 모델 패밀리와 지도 미세 조정(SFT), 직접 선호도 최적화(DPO), RLVR, AI 피드백을 통한 강화 학습(RLAIF)과 같은 다양한 기술을 지원합니다. 통합된 MLflow 추적은 훈련 및 검증 메트릭에 대한 가시성을 제공하여 성능 모니터링 및 반복 작업을 단순화합니다. 이러한 사용 편의성은 정교한 github-agentic-workflows를 구축하는 개발자를 위한 개발 수명 주기를 극적으로 가속화합니다.

평가 및 배포 성공

미세 조정된 Qwen 2.5 7B Instruct 모델의 효능은 완전히 보지 못한 도구 시나리오를 포함한 홀드아웃 데이터에 대해 엄격하게 평가되었습니다. 이는 일반화에 대한 중요한 테스트였습니다. 결과는 설득력이 있었습니다. 미세 조정된 모델은 기본 모델에 비해 도구 호출 보상에서 놀라운 57% 향상을 달성했습니다. 훈련 중에 접하지 못했던 시나리오에서 이처럼 상당한 성능 향상은 모델에게 도구 상호 작용에 대한 견고한 의사 결정 능력을 가르치는 RLVR의 힘을 강조합니다.

이러한 향상된 신뢰성은 프로덕션 환경에 AI 에이전트를 배포하는 데 더 높은 신뢰와 확신으로 직접 연결됩니다. 도구 환각, 잘못된 매개변수, 부적절한 행동 발생을 최소화함으로써 기업은 AI 에이전트를 더 중요하고 민감한 작업에 활용할 수 있습니다. SageMaker AI가 모델 배포 및 인프라 관리의 복잡성을 처리하므로 개발자는 미세 조정에서 프로덕션으로 원활하게 전환하여 에이전트 AI 솔루션의 잠재력을 최대한 실현할 수 있습니다. 이 기능은 실제 영향을 위한 에이전트 AI 운영화라는 더 넓은 비전과 일치합니다.

요약하자면, Amazon SageMaker AI의 서버리스 모델 맞춤화와 RLVR의 강력한 학습 능력의 조합은 높은 신뢰성을 갖춘 에이전트 도구 호출 시스템을 구축하는 강력한 경로를 제공합니다. 이 혁신적인 접근 방식은 개발을 가속화하고 운영 부담을 줄이며 궁극적으로 전례 없는 정확성과 신뢰성을 가진 AI 에이전트를 제공합니다.

자주 묻는 질문

What is agentic tool calling and why is it crucial for AI agents?

Agentic tool calling is the mechanism that empowers AI agents to perform real-world actions like querying databases, initiating workflows, fetching real-time information, and executing tasks on a user's behalf. It's crucial because it bridges the gap between language understanding and practical application, allowing AI agents to move beyond just generating text to actually interacting with external systems and data sources, thereby making them genuinely useful in production environments.

What are the common challenges AI agents face when performing tool calls?

AI agents frequently encounter challenges such as hallucinating tools that don't exist, passing incorrect parameters to valid tools, or attempting actions when they should instead seek clarification from the user. These failures lead to unreliable agent behavior, eroding user trust and posing significant hurdles to the successful deployment of AI agents in critical production systems, ultimately limiting their real-world utility.

How does Amazon SageMaker AI address the challenges of agentic tool calling?

Amazon SageMaker AI addresses these challenges through its serverless model customization capabilities, particularly using Reinforcement Learning with Verifiable Rewards (RLVR). This approach allows developers to fine-tune large language models (LLMs) to improve their tool-calling accuracy without managing complex infrastructure. SageMaker AI handles the operational overhead of GPU provisioning, memory management, and reward infrastructure, letting users focus on data, reward functions, and model behavior.

What is Reinforcement Learning with Verifiable Rewards (RLVR) and how does it work?

RLVR is a powerful fine-tuning technique where the model generates multiple candidate responses for a given prompt. A predefined reward function then evaluates these candidates, providing a signal about their quality and correctness. The model subsequently updates its internal policy to favor responses that received higher reward scores, using methods like Group Relative Policy Optimization (GRPO), thereby iteratively learning to produce more accurate and desired outputs for specific tasks like tool calling.

Why is RLVR considered more effective than Supervised Fine-Tuning (SFT) for tool calling tasks?

While SFT requires meticulously labeled examples for every desired behavior (e.g., calling a tool, clarifying, refusing), RLVR operates differently. SFT can struggle to generalize decision-making between these behaviors. RLVR, by contrast, allows the model to learn the optimal decision boundary by generating multiple candidates and receiving immediate feedback via a reward function, enabling it to better understand *when* to execute a tool call versus *when* to ask for more information or refuse a request.

How is training data prepared for RLVR in Amazon SageMaker AI for agentic tool calling?

Training data for RLVR in SageMaker AI is prepared as JSONL files, where each entry contains a prompt (system and user messages) and a `ground_truth` within a `reward_model` field. This `ground_truth` is what the reward function scores against. To ensure robust agent behavior, datasets are typically designed to cover three distinct scenarios: executing a tool call when all parameters are present, clarifying when information is missing, and refusing requests that are out of scope or harmful. Synthetic data generation tools like Kiro can be used for this purpose.

What agent behaviors are critical for building robust and reliable tool-calling AI agents?

Building robust tool-calling AI agents requires them to master three critical behaviors. First, they must `Execute` a tool call accurately when all necessary information is provided by the user. Second, they need to `Clarify` by asking follow-up questions when essential parameters are missing from a user's request. Third, they must `Refuse` gracefully when a request is out of scope, harmful, or cannot be fulfilled. Training models across these behaviors ensures comprehensive and trustworthy agent performance.

What prerequisites are needed to use serverless model customization in SageMaker AI?

To leverage serverless model customization in Amazon SageMaker AI, users must have an active AWS account, an AWS IAM role configured with the necessary permissions for SageMaker, a SageMaker AI domain providing Studio access for development, and an Amazon Simple Storage Service (Amazon S3) bucket to store training data and model outputs securely. These components ensure a secure and functional environment for fine-tuning models.

SageMaker AI: 서버리스 맞춤화를 통한 에이전트 도구 호출 가속화

SageMaker AI로 Qwen 2.5 7B Instruct 미세 조정

평가 및 배포 성공

자주 묻는 질문

최신 소식 받기