Bedrock AgentCore: ฝัง Live AI Browser Agent ใน React

หากคุณใช้ AWS Bedrock สำหรับโมเดล AI ของคุณ ให้ติดตั้ง AWS SDK for JavaScript:
```
npm install @aws-sdk/client-bedrock-runtime
```

ฐานโค้ดสำหรับการนำ Live View ไปใช้จริงมักจะถูกแบ่งออกเป็น: โค้ดฝั่งเซิร์ฟเวอร์ (สำหรับการจัดการเซสชันและตรรกะของเอเจนต์ AI) ที่รันใน Node.js และโค้ดฝั่งไคลเอ็นต์ (สำหรับการแสดง Live View) ที่รันภายในแอปพลิเคชัน React ซึ่งมักจะรวมเข้ากับเครื่องมือเช่น Vite

การรวมระบบทีละขั้นตอน: จากเซสชันสู่สตรีม

การรวมเอเจนต์เบราว์เซอร์ AI แบบสดเข้ากับ Amazon Bedrock AgentCore เกี่ยวข้องกับกระบวนการที่ชัดเจนสามขั้นตอน ซึ่งเชื่อมโยงตรรกะฝั่งเซิร์ฟเวอร์ของคุณเข้ากับแอปพลิเคชัน React ฝั่งไคลเอ็นต์ของคุณ และความสามารถที่แข็งแกร่งของ AWS Cloud

1. การเริ่มต้นเซสชันเบราว์เซอร์และการสร้าง URL Live View

ขั้นตอนแรกเกิดขึ้นบนเซิร์ฟเวอร์แอปพลิเคชันของคุณ นี่คือที่ที่ตรรกะแบ็กเอนด์ของคุณเริ่มต้นเซสชันเบราว์เซอร์ภายใน Amazon Bedrock AgentCore และรับ URL ที่จำเป็นในการสตรีมมุมมองแบบสดอย่างปลอดภัย

คุณจะใช้คลาส Browser จาก bedrock-agentcore SDK คลาสนี้จะจัดการความซับซ้อนของการสร้างและจัดการสภาพแวดล้อมเบราว์เซอร์แบบแยกต่างหากในคลาวด์ ผลลัพธ์หลักจากขั้นตอนนี้คือ URL ที่ลงชื่อล่วงหน้าแบบ SigV4 ซึ่งให้สิทธิ์การเข้าถึงสตรีมวิดีโอสดของเซสชันเบราว์เซอร์ที่ปลอดภัยและชั่วคราว

// ตัวอย่างโค้ดฝั่งเซิร์ฟเวอร์ (Node.js)
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';

// เริ่มต้น Bedrock AgentCore client (ตรวจสอบให้แน่ใจว่าได้กำหนดค่าข้อมูลรับรอง AWS อย่างถูกต้อง)
const agentCoreClient = new AgentCoreClient({ region: 'us-east-1' }); // ใช้ภูมิภาคที่คุณต้องการ

async function startLiveSession() {
    // สร้างเซสชันเบราว์เซอร์ใหม่
    const browser = new Browser(agentCoreClient);
    await browser.create();

    // สร้าง URL Live View
    const liveViewUrl = await browser.getLiveViewURL();
    console.log('Live View URL:', liveViewUrl);

    // เก็บ browser.sessionId เพื่อเชื่อมต่อเอเจนต์ AI ของคุณในภายหลังหรือยุติเซสชัน
    const sessionId = browser.sessionId;
    
    return { liveViewUrl, sessionId };
}

// `liveViewUrl` นี้จะถูกส่งไปยังไคลเอ็นต์ React ของคุณ

URL นี้จะถูกส่งไปยังส่วนหน้าของ React ซึ่งจะใช้เพื่อสร้างสตรีมแบบสด

2. การแสดง Live View ในแอปพลิเคชัน React ของคุณ

เมื่อแอปพลิเคชัน React ของคุณได้รับ liveViewUrl จากเซิร์ฟเวอร์ของคุณ การแสดงผลสตรีมแบบเรียลไทม์ก็ง่ายดายอย่างน่าทึ่ง ด้วยส่วนประกอบ BrowserLiveView

// ตัวอย่างโค้ดฝั่งไคลเอ็นต์ (ส่วนประกอบ React)
import React, { useEffect, useState } from 'react';
import { BrowserLiveView } from 'bedrock-agentcore';

interface LiveAgentViewerProps {
    liveViewUrl: string;
}

const LiveAgentViewer: React.FC<LiveAgentViewerProps> = ({ liveViewUrl }) => {
    if (!liveViewUrl) {
        return <p>กำลังรอ Live View URL...</p>;
    }

    return (
        <div style={{ width: '100%', height: '600px', border: '1px solid #ccc' }}>
            <BrowserLiveView url={liveViewUrl} />
        </div>
    );
};

// ในส่วนประกอบ App หลักของคุณหรือหน้าเว็บ:
// const MyPage = () => {
//     const [currentLiveViewUrl, setCurrentLiveViewUrl] = useState<string | null>(null);
//
//     useEffect(() => {
//         // ดึง liveViewUrl จากแบ็กเอนด์ของคุณ
//         fetch('/api/start-agent-session')
//             .then(res => res.json())
//             .then(data => setCurrentLiveViewUrl(data.liveViewUrl));
//     }, []);
//
//     return (
//         <div>
//             <h1>AI Agent Live View</h1>
//             <LiveAgentViewer liveViewUrl={currentLiveViewUrl} />
//         </div>
//     );
// };

ด้วยเพียง url={liveViewUrl} ส่วนประกอบ BrowserLiveView จะจัดการรายละเอียดที่ซับซ้อนของการสร้างการเชื่อมต่อ WebSocket การบริโภคสตรีม DCV และการแสดงผลฟีดวิดีโอสดภายในมิติข้อมูลที่คุณระบุ การรวม JSX ที่เรียบง่ายนี้ช่วยลดความซับซ้อนของการพัฒนาส่วนหน้าอย่างมาก ทำให้คุณสามารถมุ่งเน้นไปที่ประสบการณ์ผู้ใช้รอบ ๆ เอเจนต์แบบสดได้

3. การเชื่อมต่อเอเจนต์ AI เพื่อขับเคลื่อนเบราว์เซอร์

ขั้นตอนสุดท้ายคือการเชื่อมโยงความฉลาดของเอเจนต์ AI ของคุณเข้ากับการกระทำของเบราว์เซอร์จริงภายในเซสชันที่แยกต่างหาก ในขณะที่ BrowserLiveView ให้การตอบสนองด้วยภาพ เอเจนต์ AI ของคุณจะใช้ Playwright CDP (Chrome DevTools Protocol) เพื่อโต้ตอบกับเบราว์เซอร์โดยอัตโนมัติ

เซิร์ฟเวอร์แอปพลิเคชันของคุณ ซึ่งเป็นโฮสต์ของเอเจนต์ AI ของคุณด้วย จะใช้คุณสมบัติ page ของออบเจกต์ Browser (ซึ่งเป็นออบเจกต์ Playwright Page) เพื่อดำเนินการกระทำของเบราว์เซอร์

// ตัวอย่างโค้ดฝั่งเซิร์ฟเวอร์ (ต่อจากขั้นตอนที่ 1)
// สมมติว่าคุณมีอินเทอร์เฟซที่เหมือน Playwright หรือการใช้งาน Playwright โดยตรง
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

// ... (การตั้งค่าเบื้องต้นสำหรับการสร้างเบราว์เซอร์) ...

async function driveAgent(sessionId: string) {
    const browser = new Browser(agentCoreClient, { sessionId }); // เชื่อมต่อใหม่ไปยังเซสชันที่มีอยู่
    await browser.connect(); // เชื่อมต่อกับเซสชันเบราว์เซอร์

    const page = browser.page; // รับออบเจกต์ Playwright Page

    // ตรรกะของเอเจนต์ AI ตัวอย่าง (ทำให้ง่ายขึ้นเพื่อประกอบการอธิบาย)
    // ที่นี่คุณจะรวมเข้ากับ LLM ของคุณ (เช่น Anthropic Claude ผ่าน Bedrock Converse API)
    // เพื่อกำหนดการกระทำตามข้อความแจ้งของผู้ใช้และเนื้อหาหน้าเว็บ
    console.log("เอเจนต์กำลังนำทางไปยัง example.com...");
    await page.goto('https://www.example.com');
    console.log("เอเจนต์รอนาน 3 วินาที...");
    await page.waitForTimeout(3000); // จำลองเวลาประมวลผล

    console.log("เอเจนต์กำลังพิมพ์ลงในช่องค้นหา (สมมติ)...");
    // ตัวอย่าง: await page.type('#search-input', 'Amazon Bedrock AgentCore');
    // ตัวอย่าง: await page.click('#search-button');

    const content = await page.content();
    // ใช้ LLM เพื่อวิเคราะห์ 'content' และตัดสินใจขั้นตอนต่อไป
    const bedrockRuntimeClient = new BedrockRuntimeClient({ region: 'us-east-1' });
    const response = await bedrockRuntimeClient.send(new InvokeModelCommand({
        modelId: "anthropic.claude-3-sonnet-20240229-v1:0", // หรือโมเดลที่คุณต้องการ
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
            messages: [
                {
                    role: "user",
                    content: `วิเคราะห์เนื้อหาหน้าเว็บนี้และแนะนำการกระทำต่อไป: ${content.substring(0, 500)}`
                }
            ],
            max_tokens: 200,
        }),
    }));
    const decodedBody = new TextDecoder("utf-8").decode(response.body);
    const parsedBody = JSON.parse(decodedBody);
    console.log("โมเดล AI แนะนำการกระทำ:", parsedBody.content[0].text);

    // ตามคำแนะนำของ LLM ให้ดำเนินการเพิ่มเติมบนหน้าเว็บ...

    // อย่าลืมปิดเซสชันเบราว์เซอร์เมื่อเสร็จสิ้น
    // await browser.close();
}

// หลังจากเริ่มต้นเซสชันและได้รับ URL แล้ว คุณจะต้องเรียก driveAgent(sessionId)

วงจรการโต้ตอบนี้ — ที่ซึ่งเอเจนต์ AI ของคุณวิเคราะห์เนื้อหาหน้าเว็บ กำหนดการกระทำถัดไป และดำเนินการผ่าน Playwright CDP — เป็นแกนหลักของเอเจนต์การเรียกดูเว็บแบบอิสระ การกระทำทั้งหมดเหล่านี้จะถูกแสดงผลด้วยภาพแบบเรียลไทม์ผ่านส่วนประกอบ BrowserLiveView บนหน้าจอของผู้ใช้

ปลดล็อกความเป็นไปได้ใหม่ด้วย Embedded AI Agents

การรวม BrowserLiveView ของ Amazon Bedrock AgentCore เป็นมากกว่าคุณสมบัติทางเทคนิค; เป็นการเปลี่ยนกระบวนทัศน์ในวิธีที่ผู้ใช้โต้ตอบและไว้วางใจเอเจนต์ AI ด้วยการฝังการตอบสนองด้วยภาพแบบเรียลไทม์ นักพัฒนาสามารถสร้างแอปพลิเคชันที่ขับเคลื่อนด้วย AI ที่ไม่เพียงแต่มีประสิทธิภาพ แต่ยังโปร่งใส ตรวจสอบได้ และใช้งานง่าย

ความสามารถนี้มีผลต่อการเปลี่ยนแปลงอย่างมากสำหรับแอปพลิเคชันที่เกี่ยวข้องกับ:

เวิร์กโฟลว์ที่ซับซ้อน: การทำงานอัตโนมัติของกระบวนการออนไลน์หลายขั้นตอน เช่น การป้อนข้อมูล การเริ่มต้นใช้งาน หรือการปฏิบัติตามกฎระเบียบ ซึ่งการมองเห็นทุกขั้นตอนมีความสำคัญสูงสุด
การสนับสนุนลูกค้า: การอนุญาตให้เอเจนต์สังเกต AI co-pilots แก้ไขข้อสงสัยของลูกค้าหรือนำทางระบบ เพื่อให้มั่นใจถึงความถูกต้องและให้โอกาสในการแทรกแซง
การฝึกอบรมและการแก้ไขข้อผิดพลาด: การจัดหาเครื่องมือที่มีประสิทธิภาพให้นักพัฒนาและผู้ใช้ปลายทางเพื่อทำความเข้าใจพฤติกรรมของเอเจนต์ แก้ไขปัญหา และฝึกอบรมเอเจนต์ผ่านการสังเกตโดยตรง
บันทึกการตรวจสอบที่ได้รับการปรับปรุง: การสร้างบันทึกภาพการกระทำของเอเจนต์ ซึ่งสามารถรวมกับการบันทึกเซสชันไปยัง Amazon S3 เพื่อการตรวจสอบย้อนหลังที่ครอบคลุมและการปฏิบัติตามกฎระเบียบ

ความสามารถในการสตรีมเซสชันเบราว์เซอร์โดยตรงจาก AWS Cloud ไปยังเบราว์เซอร์ไคลเอ็นต์ โดยข้ามเซิร์ฟเวอร์แอปพลิเคชันสำหรับสตรีมวิดีโอ มีข้อได้เปรียบที่สำคัญในด้านประสิทธิภาพและความสามารถในการปรับขนาด สถาปัตยกรรมนี้ช่วยลดความล่าช้าและลดภาระบนโครงสร้างพื้นฐานแบ็กเอนด์ของคุณ ทำให้คุณสามารถปรับใช้โซลูชันเอเจนต์ AI ที่ตอบสนองได้ดีและปรับขนาดได้

ด้วยการนำ BrowserLiveView มาใช้ คุณไม่ได้แค่สร้างเอเจนต์ AI เท่านั้น แต่คุณกำลังสร้างความไว้วางใจ การควบคุม และประสบการณ์ผู้ใช้ที่สมบูรณ์ยิ่งขึ้น สำรวจความเป็นไปได้และเสริมพลังให้ผู้ใช้ของคุณมีความมั่นใจในการมอบหมายงานบนเว็บที่ซับซ้อนให้กับเอเจนต์อัจฉริยะ

แหล่งที่มา

https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/

คำถามที่พบบ่อย

What is the Amazon Bedrock AgentCore BrowserLiveView component and how does it function?

The Amazon Bedrock AgentCore BrowserLiveView component is a crucial part of the Bedrock AgentCore TypeScript SDK, designed to embed a real-time video feed of an AI agent's browsing session directly into a React application. It operates by receiving a SigV4-presigned URL from your application server, which then establishes a persistent WebSocket connection to stream video data via the Amazon DCV protocol from an isolated cloud browser session. This direct streaming mechanism ensures low latency and high fidelity, allowing users to observe every action an AI agent takes on a webpage, from navigation to form submissions, without the video stream passing through your server.

How does embedding Live View enhance user trust and confidence in AI agents?

Embedding Live View significantly boosts user trust and confidence by providing unparalleled transparency into an AI agent's operations. Instead of a 'black box' experience, users gain immediate visual confirmation of the agent's actions, observing its progress and interactions in real-time. This visual feedback loop helps users understand that the agent is on the correct path, interacting with the right elements, and progressing as expected. This is particularly valuable for complex or sensitive workflows, where visual evidence can reassure users that the agent is performing its tasks accurately and responsibly, enhancing overall confidence and allowing for timely intervention if necessary.

What are the primary architectural components involved in integrating a Live View AI agent?

The integration of a Live View AI agent involves three main architectural components. First, the user's web browser, running a React application, hosts the BrowserLiveView component, which renders the real-time stream. Second, the application server acts as the orchestrator, managing the AI agent's logic, initiating browser sessions via the Amazon Bedrock AgentCore API, and generating secure, time-limited SigV4-presigned URLs for the Live View stream. Third, the AWS Cloud hosts Amazon Bedrock AgentCore and Bedrock services, providing the isolated cloud browser sessions, automation capabilities (via Playwright CDP), and the DCV-powered Live View streaming endpoint. A key design point is that the DCV stream flows directly from AWS to the user's browser, bypassing the application server for optimal performance.

Can developers utilize any AI model or agent framework with Amazon Bedrock AgentCore's Live View?

Yes, developers have the flexibility to use any AI model or agent framework of their choice with Amazon Bedrock AgentCore's Live View. While the provided example often demonstrates integration with the Amazon Bedrock Converse API and models like Anthropic Claude, the BrowserLiveView component itself is model-agnostic. This means that the real-time visual streaming functionality is decoupled from the AI agent's underlying reasoning and decision-making logic. As long as your chosen AI agent or framework can interact with the browser automation endpoint provided by AgentCore (typically via Playwright CDP), you can leverage Live View to provide visual feedback to your users, making it a highly adaptable solution for various AI-powered applications.

What are the essential prerequisites for setting up a Live View AI browser agent with Amazon Bedrock AgentCore?

To set up a Live View AI browser agent, several prerequisites are necessary. Developers need Node.js version 20 or later for the server-side logic and React for the client-side application. An AWS account in a supported region is required, along with AWS credentials that have the necessary Amazon Bedrock AgentCore Browser permissions. It's crucial to follow the principle of least privilege for IAM permissions and use temporary credentials (e.g., from AWS IAM Identity Center or STS) rather than long-lived access keys for enhanced security. Additionally, the Amazon Bedrock AgentCore TypeScript SDK (`bedrock-agentcore`) and potentially the AWS SDK for JavaScript (`@aws-sdk/client-bedrock-runtime`) if using Bedrock models, must be installed in your project.

How does the DCV protocol facilitate real-time, low-latency video streaming for Live View?

The Amazon DCV (NICE DCV) protocol is instrumental in providing real-time, low-latency video streaming for the BrowserLiveView component. DCV is a high-performance remote display protocol designed to deliver a rich user experience over varying network conditions. In the context of AgentCore, it efficiently encodes and transmits the visual output of the isolated cloud browser session directly to the user's React application via a WebSocket connection. By optimizing data compression and transmission, DCV ensures that the visual feed of the AI agent's actions appears smooth and responsive, minimizing lag and enabling users to observe the agent's behavior as if it were happening locally on their machine, without the need for complex streaming infrastructure setup by the developer.

อัปเดตข่าวสาร

รับข่าว AI ล่าสุดในกล่องจดหมายของคุณ

แชร์