Bedrock AgentCore: Mag-embed ng Live AI Browser Agent sa React

Kung ginagamit mo ang AWS Bedrock para sa iyong AI model, i-install ang AWS SDK for JavaScript:
```
npm install @aws-sdk/client-bedrock-runtime
```

Ang code base para sa pagpapatupad ng Live View ay karaniwang nahahati: ang server-side code (para sa pamamahala ng session at lohika ng AI agent) ay tumatakbo sa Node.js, at ang client-side code (para sa pag-render ng Live View) ay tumatakbo sa loob ng isang React application, madalas na binubuo gamit ang mga tool tulad ng Vite.

Step-by-Step na Integrasyon: Mula sa Session Tungo sa Stream

Ang pag-integrate ng isang live AI browser agent sa Amazon Bedrock AgentCore ay nagsasangkot ng isang malinaw, tatlong-hakbang na proseso, na nag-uugnay sa iyong server-side logic sa iyong client-side React application at sa matatag na kakayahan ng AWS Cloud.

1. Pagsisimula ng Browser Session at Pagbuo ng Live View URL

Ang unang hakbang ay nagaganap sa iyong application server. Ito ang lugar kung saan sinisimulan ng iyong backend logic ang isang browser session sa loob ng Amazon Bedrock AgentCore at secure na nakukuha ang kinakailangang URL upang i-stream ang live view.

Gagamitin mo ang Browser class mula sa bedrock-agentcore SDK. Hawak ng class na ito ang kumplikasyon ng paglikha at pamamahala ng isolated na browser environment sa cloud. Ang pangunahing output mula sa hakbang na ito ay isang SigV4-presigned URL, na nagbibigay ng secure, pansamantalang access sa live video stream ng browser session.

// Halimbawa ng server-side code (Node.js)
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';

// Simulan ang Bedrock AgentCore client (tiyakin na ang tamang AWS credentials ay naka-configure)
const agentCoreClient = new AgentCoreClient({ region: 'us-east-1' }); // Gamitin ang iyong gustong rehiyon

async function startLiveSession() {
    // Gumawa ng bagong browser session
    const browser = new Browser(agentCoreClient);
    await browser.create();

    // Buuin ang Live View URL
    const liveViewUrl = await browser.getLiveViewURL();
    console.log('Live View URL:', liveViewUrl);

    // I-store ang browser.sessionId upang mamaya ay ikonekta ang iyong AI agent o i-terminate ang session
    const sessionId = browser.sessionId;
    
    return { liveViewUrl, sessionId };
}

// Ang `liveViewUrl` na ito ay ipapadala sa iyong React client.

Ang liveViewUrl na ito ay ipapadala sa iyong React frontend, na gagamit nito upang itatag ang live stream.

2. Pag-render ng Live View sa Iyong React Application

Kapag natanggap na ng iyong React application ang liveViewUrl mula sa iyong server, ang pag-render ng real-time stream ay napakasimple, salamat sa component ng BrowserLiveView.

// Halimbawa ng client-side code (React component)
import React, { useEffect, useState } from 'react';
import { BrowserLiveView } from 'bedrock-agentcore';

interface LiveAgentViewerProps {
    liveViewUrl: string;
}

const LiveAgentViewer: React.FC<LiveAgentViewerProps> = ({ liveViewUrl }) => {
    if (!liveViewUrl) {
        return <p>Naghihintay para sa Live View URL...</p>;
    }

    return (
        <div style={{ width: '100%', height: '600px', border: '1px solid #ccc' }}>
            <BrowserLiveView url={liveViewUrl} />
        </div>
    );
};

// Sa iyong pangunahing App component o page:
// const MyPage = () => {
//     const [currentLiveViewUrl, setCurrentLiveViewUrl] = useState<string | null>(null);
//
//     useEffect(() => {
//         // Kumuha ng liveViewUrl mula sa iyong backend
//         fetch('/api/start-agent-session')
//             .then(res => res.json())
//             .then(data => setCurrentLiveViewUrl(data.liveViewUrl));
//     }, []);
//
//     return (
//         <div>
//             <h1>AI Agent Live View</h1>
//             <LiveAgentViewer liveViewUrl={currentLiveViewUrl} />
//         </div>
//     );
// };

Sa url={liveViewUrl} lamang, hinahawakan ng component ng BrowserLiveView ang masalimuot na detalye ng pagtatatag ng koneksyon ng WebSocket, paggamit ng DCV stream, at pag-render ng live video feed sa loob ng iyong tinukoy na dimensyon. Ang minimal na JSX integration na ito ay lubos na nagpapasimple sa frontend development, na nagbibigay-daan sa iyo na tumuon sa karanasan ng user sa paligid ng live agent.

3. Pagkonekta ng AI Agent Upang Patakbuhin ang Browser

Ang huling hakbang ay kinokonekta ang intelligence ng iyong AI agent sa aktwal na mga aksyon ng browser sa loob ng isolated session. Habang nagbibigay ang BrowserLiveView ng visual na feedback, ginagamit ng iyong AI agent ang Playwright CDP (Chrome DevTools Protocol) upang makipag-ugnayan sa browser nang programmatically.

Ang iyong application server, na nagho-host din ng iyong AI agent, ay gagamitin ang page property ng Browser object (na isang Playwright Page object) upang magsagawa ng mga aksyon ng browser.

// Halimbawa ng server-side code (patuloy mula sa hakbang 1)
// Ipagpalagay na mayroon kang Playwright-like interface o direktang paggamit ng Playwright
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

// ... (nakaraang setup para sa paglikha ng browser) ...

async function driveAgent(sessionId: string) {
    const browser = new Browser(agentCoreClient, { sessionId }); // Kumonekta muli sa umiiral na session
    await browser.connect(); // Kumonekta sa session ng browser

    const page = browser.page; // Kunin ang Playwright Page object

    // Halimbawa ng lohika ng AI agent (pinasimple para sa ilustrasyon)
    // Dito mo i-integrate sa iyong LLM (hal., Anthropic Claude sa pamamagitan ng Bedrock Converse API)
    // upang matukoy ang mga aksyon batay sa mga prompt ng user at nilalaman ng pahina.
    console.log("Nag-navigate ang Agent sa example.com...");
    await page.goto('https://www.example.com');
    console.log("Naghihintay ang Agent ng 3 segundo...");
    await page.waitForTimeout(3000); // Gayahin ang oras ng pagproseso

    console.log("Nagta-type ang Agent sa isang search box (hypothetical)...");
    // Halimbawa: await page.type('#search-input', 'Amazon Bedrock AgentCore');
    // Halimbawa: await page.click('#search-button');

    const content = await page.content();
    // Gumamit ng LLM upang suriin ang 'content' at magpasya sa mga susunod na hakbang
    const bedrockRuntimeClient = new BedrockRuntimeClient({ region: 'us-east-1' });
    const response = await bedrockRuntimeClient.send(new InvokeModelCommand({
        modelId: "anthropic.claude-3-sonnet-20240229-v1:0", // o ang iyong ginustong modelo
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
            messages: [
                {
                    role: "user",
                    content: `Suriin ang nilalaman ng webpage na ito at imungkahi ang susunod na aksyon: ${content.substring(0, 500)}`
                }
            ],
            max_tokens: 200,
        }),
    }));
    const decodedBody = new TextDecoder("utf-8").decode(response.body);
    const parsedBody = JSON.parse(decodedBody);
    console.log("Ang iminungkahing aksyon ng AI Model:", parsedBody.content[0].text);

    // Batay sa mungkahi ng LLM, magsagawa ng karagdagang mga aksyon sa pahina...

    // Huwag kalimutang isara ang browser session kapag tapos na
    // await browser.close();
}

// Matapos simulan ang session at makuha ang URL, tatawagin mo ang driveAgent(sessionId)

Ang interaction loop na ito—kung saan sinusuri ng iyong AI agent ang nilalaman ng pahina, tinutukoy ang susunod na aksyon, at isinasagawa ito sa pamamagitan ng Playwright CDP—ay bumubuo ng core ng isang autonomous na browsing agent. Ang lahat ng mga aksyon na ito ay biswal na ini-render sa real-time sa pamamagitan ng BrowserLiveView component sa screen ng user.

Pagbubukas ng Bagong Posibilidad Gamit ang Naka-embed na mga AI Agent

Ang integrasyon ng BrowserLiveView ng Amazon Bedrock AgentCore ay higit pa sa isang teknikal na feature; ito ay isang pagbabago sa paradigma kung paano nakikipag-ugnayan ang mga user at nagtitiwala sa mga AI agent. Sa pamamagitan ng pag-embed ng real-time na visual feedback, ang mga developer ay makakagawa ng mga application na pinapagana ng AI na hindi lamang mahusay kundi pati na rin transparent, auditable, at user-friendly.

Ang kakayahang ito ay partikular na transformative para sa mga application na kinasasangkutan ng:

Kumplikadong Workflows: Pag-automate ng multi-step na online na proseso tulad ng data entry, onboarding, o regulatory compliance, kung saan mahalaga ang visibility sa bawat hakbang.
Customer Support: Pagpapahintulot sa mga agent na obserbahan ang mga AI co-pilot na nagre-resolve ng mga query ng customer o nagna-navigate sa mga system, tinitiyak ang katumpakan at nagbibigay ng mga pagkakataon para sa interbensyon.
Pagsasanay at Pag-debug: Pagbibigay sa mga developer at end-user ng isang malakas na tool upang maunawaan ang pag-uugali ng agent, i-debug ang mga isyu, at sanayin ang mga agent sa pamamagitan ng direktang pagmamasid.
Pinahusay na Audit Trails: Pagbuo ng visual records ng mga aksyon ng agent, na maaaring pagsamahin sa mga session recording sa Amazon S3 para sa komprehensibong post-hoc review at compliance.

Ang kakayahang direktang i-stream ang mga session ng browser mula sa AWS Cloud patungo sa mga client browser, nilalampasan ang application server para sa video stream, ay nag-aalok ng malaking bentahe sa mga tuntunin ng performance at scalability. Pinapaliit ng arkitekturang ito ang latency at binabawasan ang pasanin sa iyong backend infrastructure, na nagpapahintulot sa iyo na mag-deploy ng mga lubos na tumutugon at scalable na solusyon ng AI agent.

Sa pag-ampon ng BrowserLiveView, hindi ka lang nagtatayo ng mga AI agent; nagtatayo ka ng tiwala, kontrol, at isang mas mayamang karanasan ng user. Tuklasin ang mga posibilidad at bigyan ng kapangyarihan ang iyong mga user ng kumpiyansa na mag-delega ng kumplikadong mga gawain sa web sa mga intelligent agent.

Orihinal na pinagmulan

https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/

Mga Karaniwang Tanong

What is the Amazon Bedrock AgentCore BrowserLiveView component and how does it function?

The Amazon Bedrock AgentCore BrowserLiveView component is a crucial part of the Bedrock AgentCore TypeScript SDK, designed to embed a real-time video feed of an AI agent's browsing session directly into a React application. It operates by receiving a SigV4-presigned URL from your application server, which then establishes a persistent WebSocket connection to stream video data via the Amazon DCV protocol from an isolated cloud browser session. This direct streaming mechanism ensures low latency and high fidelity, allowing users to observe every action an AI agent takes on a webpage, from navigation to form submissions, without the video stream passing through your server.

How does embedding Live View enhance user trust and confidence in AI agents?

Embedding Live View significantly boosts user trust and confidence by providing unparalleled transparency into an AI agent's operations. Instead of a 'black box' experience, users gain immediate visual confirmation of the agent's actions, observing its progress and interactions in real-time. This visual feedback loop helps users understand that the agent is on the correct path, interacting with the right elements, and progressing as expected. This is particularly valuable for complex or sensitive workflows, where visual evidence can reassure users that the agent is performing its tasks accurately and responsibly, enhancing overall confidence and allowing for timely intervention if necessary.

What are the primary architectural components involved in integrating a Live View AI agent?

The integration of a Live View AI agent involves three main architectural components. First, the user's web browser, running a React application, hosts the BrowserLiveView component, which renders the real-time stream. Second, the application server acts as the orchestrator, managing the AI agent's logic, initiating browser sessions via the Amazon Bedrock AgentCore API, and generating secure, time-limited SigV4-presigned URLs for the Live View stream. Third, the AWS Cloud hosts Amazon Bedrock AgentCore and Bedrock services, providing the isolated cloud browser sessions, automation capabilities (via Playwright CDP), and the DCV-powered Live View streaming endpoint. A key design point is that the DCV stream flows directly from AWS to the user's browser, bypassing the application server for optimal performance.

Can developers utilize any AI model or agent framework with Amazon Bedrock AgentCore's Live View?

Yes, developers have the flexibility to use any AI model or agent framework of their choice with Amazon Bedrock AgentCore's Live View. While the provided example often demonstrates integration with the Amazon Bedrock Converse API and models like Anthropic Claude, the BrowserLiveView component itself is model-agnostic. This means that the real-time visual streaming functionality is decoupled from the AI agent's underlying reasoning and decision-making logic. As long as your chosen AI agent or framework can interact with the browser automation endpoint provided by AgentCore (typically via Playwright CDP), you can leverage Live View to provide visual feedback to your users, making it a highly adaptable solution for various AI-powered applications.

What are the essential prerequisites for setting up a Live View AI browser agent with Amazon Bedrock AgentCore?

To set up a Live View AI browser agent, several prerequisites are necessary. Developers need Node.js version 20 or later for the server-side logic and React for the client-side application. An AWS account in a supported region is required, along with AWS credentials that have the necessary Amazon Bedrock AgentCore Browser permissions. It's crucial to follow the principle of least privilege for IAM permissions and use temporary credentials (e.g., from AWS IAM Identity Center or STS) rather than long-lived access keys for enhanced security. Additionally, the Amazon Bedrock AgentCore TypeScript SDK (`bedrock-agentcore`) and potentially the AWS SDK for JavaScript (`@aws-sdk/client-bedrock-runtime`) if using Bedrock models, must be installed in your project.

How does the DCV protocol facilitate real-time, low-latency video streaming for Live View?

The Amazon DCV (NICE DCV) protocol is instrumental in providing real-time, low-latency video streaming for the BrowserLiveView component. DCV is a high-performance remote display protocol designed to deliver a rich user experience over varying network conditions. In the context of AgentCore, it efficiently encodes and transmits the visual output of the isolated cloud browser session directly to the user's React application via a WebSocket connection. By optimizing data compression and transmission, DCV ensures that the visual feed of the AI agent's actions appears smooth and responsive, minimizing lag and enabling users to observe the agent's behavior as if it were happening locally on their machine, without the need for complex streaming infrastructure setup by the developer.

Manatiling Updated

Kunin ang pinakabagong AI news sa iyong inbox.

I-share