Bedrock AgentCore: React에 실시간 AI 브라우저 에이전트 삽입하기

AI 모델에 AWS Bedrock을 사용하는 경우, AWS SDK for JavaScript 설치:
```
npm install @aws-sdk/client-bedrock-runtime
```

라이브 뷰를 구현하기 위한 코드 베이스는 일반적으로 분리됩니다. 서버 측 코드(세션 관리 및 AI 에이전트 로직용)는 Node.js에서 실행되고, 클라이언트 측 코드(라이브 뷰 렌더링용)는 종종 Vite와 같은 도구로 번들링된 React 애플리케이션 내에서 실행됩니다.

단계별 통합: 세션에서 스트림까지

Amazon Bedrock AgentCore를 사용하여 라이브 AI 브라우저 에이전트를 통합하는 것은 서버 측 로직을 클라이언트 측 React 애플리케이션 및 AWS 클라우드의 강력한 기능과 연결하는 명확한 3단계 프로세스를 포함합니다.

1. 브라우저 세션 시작 및 라이브 뷰 URL 생성

첫 번째 단계는 애플리케이션 서버에서 발생합니다. 여기에서 백엔드 로직이 Amazon Bedrock AgentCore 내에서 브라우저 세션을 시작하고 라이브 뷰를 스트리밍하는 데 필요한 URL을 안전하게 얻습니다.

bedrock-agentcore SDK의 Browser 클래스를 사용합니다. 이 클래스는 클라우드에서 격리된 브라우저 환경을 생성하고 관리하는 복잡성을 처리합니다. 이 단계의 주요 결과는 브라우저 세션의 라이브 비디오 스트림에 대한 보안되고 임시적인 액세스를 허용하는 SigV4-presigned URL입니다.

// Example server-side code (Node.js)
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';

// Initialize Bedrock AgentCore client (ensure proper AWS credentials are configured)
const agentCoreClient = new AgentCoreClient({ region: 'us-east-1' }); // Use your desired region

async function startLiveSession() {
    // Create a new browser session
    const browser = new Browser(agentCoreClient);
    await browser.create();

    // Generate the Live View URL
    const liveViewUrl = await browser.getLiveViewURL();
    console.log('Live View URL:', liveViewUrl);

    // Store browser.sessionId to later connect your AI agent or terminate the session
    const sessionId = browser.sessionId;
    
    return { liveViewUrl, sessionId };
}

// This `liveViewUrl` will be sent to your React client.

이 liveViewUrl은 React 프런트엔드로 전송되며, 프런트엔드는 이를 사용하여 라이브 스트림을 설정합니다.

2. React 애플리케이션에서 라이브 뷰 렌더링

React 애플리케이션이 서버로부터 liveViewUrl을 수신하면, BrowserLiveView 컴포넌트 덕분에 실시간 스트림을 렌더링하는 것이 매우 간단합니다.

// Example client-side code (React component)
import React, { useEffect, useState } from 'react';
import { BrowserLiveView } from 'bedrock-agentcore';

interface LiveAgentViewerProps {
    liveViewUrl: string;
}

const LiveAgentViewer: React.FC<LiveAgentViewerProps> = ({ liveViewUrl }) => {
    if (!liveViewUrl) {
        return <p>Waiting for Live View URL...</p>;
    }

    return (
        <div style={{ width: '100%', height: '600px', border: '1px solid #ccc' }}>
            <BrowserLiveView url={liveViewUrl} />
        </div>
    );
};

// In your main App component or page:
// const MyPage = () => {
//     const [currentLiveViewUrl, setCurrentLiveViewUrl] = useState<string | null>(null);
//
//     useEffect(() => {
//         // Fetch the liveViewUrl from your backend
//         fetch('/api/start-agent-session')
//             .then(res => res.json())
//             .then(data => setCurrentLiveViewUrl(data.liveViewUrl));
//     }, []);
//
//     return (
//         <div>
//             <h1>AI Agent Live View</h1>
//             <LiveAgentViewer liveViewUrl={currentLiveViewUrl} />
//         </div>
//     );
// };

url={liveViewUrl}만으로 BrowserLiveView 컴포넌트는 WebSocket 연결 설정, DCV 스트림 소비 및 지정된 크기 내에서 라이브 비디오 피드 렌더링과 같은 복잡한 세부 사항을 처리합니다. 이 최소한의 JSX 통합은 프런트엔드 개발을 크게 단순화하여 라이브 에이전트를 중심으로 한 사용자 경험에 집중할 수 있도록 합니다.

3. AI 에이전트를 브라우저 구동에 연결

마지막 단계는 AI 에이전트의 지능을 격리된 세션 내의 실제 브라우저 작업에 연결하는 것입니다. BrowserLiveView가 시각적 피드백을 제공하는 동안, AI 에이전트는 Playwright CDP (Chrome DevTools Protocol)를 사용하여 브라우저와 프로그래밍 방식으로 상호 작용합니다.

AI 에이전트를 호스팅하는 애플리케이션 서버는 Browser 객체의 page 속성(Playwright Page 객체)을 사용하여 브라우저 작업을 실행합니다.

// Example server-side code (continued from step 1)
// Assuming you have a Playwright-like interface or direct Playwright usage
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

// ... (previous setup for browser creation) ...

async function driveAgent(sessionId: string) {
    const browser = new Browser(agentCoreClient, { sessionId }); // Reconnect to existing session
    await browser.connect(); // Connect to the browser session

    const page = browser.page; // Get the Playwright Page object

    // Example AI agent logic (simplified for illustration)
    // Here you would integrate with your LLM (e.g., Anthropic Claude via Bedrock Converse API)
    // to determine actions based on user prompts and page content.
    console.log("Agent navigating to example.com...");
    await page.goto('https://www.example.com');
    console.log("Agent waited for 3 seconds...");
    await page.waitForTimeout(3000); // Simulate processing time

    console.log("Agent typing into a search box (hypothetical)...");
    // Example: await page.type('#search-input', 'Amazon Bedrock AgentCore');
    // Example: await page.click('#search-button');

    const content = await page.content();
    // Use an LLM to analyze 'content' and decide next steps
    const bedrockRuntimeClient = new BedrockRuntimeClient({ region: 'us-east-1' });
    const response = await bedrockRuntimeClient.send(new InvokeModelCommand({
        modelId: "anthropic.claude-3-sonnet-20240229-v1:0", // or your preferred model
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
            messages: [
                {
                    role: "user",
                    content: `Analyze this webpage content and suggest the next action: ${content.substring(0, 500)}`
                }
            ],
            max_tokens: 200,
        }),
    }));
    const decodedBody = new TextDecoder("utf-8").decode(response.body);
    const parsedBody = JSON.parse(decodedBody);
    console.log("AI Model suggested action:", parsedBody.content[0].text);

    // Based on LLM's suggestion, execute further page actions...

    // Don't forget to close the browser session when done
    // await browser.close();
}

// After starting the session and getting the URL, you would then call driveAgent(sessionId)

AI 에이전트가 페이지 콘텐츠를 분석하고, 다음 작업을 결정하며, Playwright CDP를 통해 이를 실행하는 이 상호 작용 루프는 자율 브라우징 에이전트의 핵심을 형성합니다. 이러한 모든 작업은 사용자 화면의 BrowserLiveView 컴포넌트를 통해 실시간으로 시각적으로 렌더링됩니다.

내장형 AI 에이전트로 새로운 가능성 열기

Amazon Bedrock AgentCore의 BrowserLiveView 통합은 단순한 기술적 기능 그 이상입니다. 이는 사용자가 AI 에이전트와 상호 작용하고 신뢰하는 방식의 패러다임 전환입니다. 실시간 시각적 피드백을 내장함으로써 개발자는 효율적일 뿐만 아니라 투명하고 감사 가능하며 사용자 친화적인 AI 기반 애플리케이션을 만들 수 있습니다.

이러한 기능은 다음을 포함하는 애플리케이션에 특히 혁신적입니다.

복잡한 워크플로우: 데이터 입력, 온보딩 또는 규정 준수와 같은 다단계 온라인 프로세스를 자동화하여 각 단계에 대한 가시성이 가장 중요한 경우.
고객 지원: 에이전트가 AI 코파일럿이 고객 쿼리를 해결하거나 시스템을 탐색하는 것을 관찰하도록 하여 정확성을 보장하고 개입할 기회를 제공합니다.
교육 및 디버깅: 개발자와 최종 사용자에게 에이전트 동작을 이해하고, 문제를 디버깅하며, 직접 관찰을 통해 에이전트를 교육하는 강력한 도구를 제공합니다.
향상된 감사 추적: 에이전트 작업의 시각적 기록을 생성하여 Amazon S3의 세션 기록과 결합하여 포괄적인 사후 검토 및 규정 준수를 가능하게 합니다.

AWS 클라우드에서 클라이언트 브라우저로 직접 브라우저 세션을 스트리밍하는 기능은 비디오 스트림에 대해 애플리케이션 서버를 우회하여 성능 및 확장성 측면에서 상당한 이점을 제공합니다. 이 아키텍처는 지연 시간을 최소화하고 백엔드 인프라의 부담을 줄여 고도로 반응적이고 확장 가능한 AI 에이전트 솔루션을 배포할 수 있도록 합니다.

BrowserLiveView를 채택함으로써, 여러분은 단순히 AI 에이전트를 구축하는 것이 아니라 신뢰, 제어 및 풍부한 사용자 경험을 구축하는 것입니다. 가능성을 탐색하고 지능형 에이전트에게 복잡한 웹 작업을 위임하는 데 필요한 확신을 사용자에게 부여하십시오.

원본 출처

https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/

자주 묻는 질문

What is the Amazon Bedrock AgentCore BrowserLiveView component and how does it function?

The Amazon Bedrock AgentCore BrowserLiveView component is a crucial part of the Bedrock AgentCore TypeScript SDK, designed to embed a real-time video feed of an AI agent's browsing session directly into a React application. It operates by receiving a SigV4-presigned URL from your application server, which then establishes a persistent WebSocket connection to stream video data via the Amazon DCV protocol from an isolated cloud browser session. This direct streaming mechanism ensures low latency and high fidelity, allowing users to observe every action an AI agent takes on a webpage, from navigation to form submissions, without the video stream passing through your server.

How does embedding Live View enhance user trust and confidence in AI agents?

Embedding Live View significantly boosts user trust and confidence by providing unparalleled transparency into an AI agent's operations. Instead of a 'black box' experience, users gain immediate visual confirmation of the agent's actions, observing its progress and interactions in real-time. This visual feedback loop helps users understand that the agent is on the correct path, interacting with the right elements, and progressing as expected. This is particularly valuable for complex or sensitive workflows, where visual evidence can reassure users that the agent is performing its tasks accurately and responsibly, enhancing overall confidence and allowing for timely intervention if necessary.

What are the primary architectural components involved in integrating a Live View AI agent?

The integration of a Live View AI agent involves three main architectural components. First, the user's web browser, running a React application, hosts the BrowserLiveView component, which renders the real-time stream. Second, the application server acts as the orchestrator, managing the AI agent's logic, initiating browser sessions via the Amazon Bedrock AgentCore API, and generating secure, time-limited SigV4-presigned URLs for the Live View stream. Third, the AWS Cloud hosts Amazon Bedrock AgentCore and Bedrock services, providing the isolated cloud browser sessions, automation capabilities (via Playwright CDP), and the DCV-powered Live View streaming endpoint. A key design point is that the DCV stream flows directly from AWS to the user's browser, bypassing the application server for optimal performance.

Can developers utilize any AI model or agent framework with Amazon Bedrock AgentCore's Live View?

Yes, developers have the flexibility to use any AI model or agent framework of their choice with Amazon Bedrock AgentCore's Live View. While the provided example often demonstrates integration with the Amazon Bedrock Converse API and models like Anthropic Claude, the BrowserLiveView component itself is model-agnostic. This means that the real-time visual streaming functionality is decoupled from the AI agent's underlying reasoning and decision-making logic. As long as your chosen AI agent or framework can interact with the browser automation endpoint provided by AgentCore (typically via Playwright CDP), you can leverage Live View to provide visual feedback to your users, making it a highly adaptable solution for various AI-powered applications.

What are the essential prerequisites for setting up a Live View AI browser agent with Amazon Bedrock AgentCore?

To set up a Live View AI browser agent, several prerequisites are necessary. Developers need Node.js version 20 or later for the server-side logic and React for the client-side application. An AWS account in a supported region is required, along with AWS credentials that have the necessary Amazon Bedrock AgentCore Browser permissions. It's crucial to follow the principle of least privilege for IAM permissions and use temporary credentials (e.g., from AWS IAM Identity Center or STS) rather than long-lived access keys for enhanced security. Additionally, the Amazon Bedrock AgentCore TypeScript SDK (`bedrock-agentcore`) and potentially the AWS SDK for JavaScript (`@aws-sdk/client-bedrock-runtime`) if using Bedrock models, must be installed in your project.

How does the DCV protocol facilitate real-time, low-latency video streaming for Live View?

The Amazon DCV (NICE DCV) protocol is instrumental in providing real-time, low-latency video streaming for the BrowserLiveView component. DCV is a high-performance remote display protocol designed to deliver a rich user experience over varying network conditions. In the context of AgentCore, it efficiently encodes and transmits the visual output of the isolated cloud browser session directly to the user's React application via a WebSocket connection. By optimizing data compression and transmission, DCV ensures that the visual feed of the AI agent's actions appears smooth and responsive, minimizing lag and enabling users to observe the agent's behavior as if it were happening locally on their machine, without the need for complex streaming infrastructure setup by the developer.