Code Velocity
开发者工具

Bedrock AgentCore:在 React 中嵌入实时 AI 浏览器代理

·5 分钟阅读·AWS·原始来源
分享
Amazon Bedrock AgentCore 架构图,展示了在 React 应用程序中嵌入实时 AI 浏览器代理的数据流。
  • 如果您使用 AWS Bedrock 作为 AI 模型,请安装 AWS SDK for JavaScript:
    npm install @aws-sdk/client-bedrock-runtime
    

实施实时视图的代码库通常是分开的:服务器端代码(用于会话管理和 AI 代理逻辑)在 Node.js 中运行,而客户端代码(用于渲染实时视图)在 React 应用程序中运行,通常使用 Vite 等工具进行打包。

分步集成:从会话到流

使用 Amazon Bedrock AgentCore 集成实时 AI 浏览器代理涉及一个清晰的三步过程,它将您的服务器端逻辑与您的客户端 React 应用程序以及强大的 AWS 云功能连接起来。

1. 启动浏览器会话并生成实时视图 URL

第一步发生在您的应用程序服务器上。这是您的后端逻辑在 Amazon Bedrock AgentCore 中启动浏览器会话并安全获取流式传输实时视图所需 URL 的地方。

您将使用 bedrock-agentcore SDK 中的 Browser 类。此类处理在云中创建和管理隔离浏览器环境的复杂性。此步骤的关键输出是一个 SigV4 预签名 URL,它授予对浏览器会话实时视频流的安全、临时访问权限。

// 示例服务器端代码 (Node.js)
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';

// 初始化 Bedrock AgentCore 客户端(确保配置了正确的 AWS 凭证)
const agentCoreClient = new AgentCoreClient({ region: 'us-east-1' }); // 使用您想要的区域

async function startLiveSession() {
    // 创建一个新的浏览器会话
    const browser = new Browser(agentCoreClient);
    await browser.create();

    // 生成实时视图 URL
    const liveViewUrl = await browser.getLiveViewURL();
    console.log('实时视图 URL:', liveViewUrl);

    // 存储 browser.sessionId,以便稍后连接您的 AI 代理或终止会话
    const sessionId = browser.sessionId;
    
    return { liveViewUrl, sessionId };
}

// 这个 `liveViewUrl` 将被发送到您的 React 客户端。

然后,此 URL 会传递给您的 React 前端,前端将使用它来建立实时流。

2. 在您的 React 应用程序中渲染实时视图

一旦您的 React 应用程序从服务器收到 liveViewUrl,由于 BrowserLiveView 组件的存在,渲染实时流变得非常简单。

// 示例客户端代码 (React 组件)
import React, { useEffect, useState } from 'react';
import { BrowserLiveView } from 'bedrock-agentcore';

interface LiveAgentViewerProps {
    liveViewUrl: string;
}

const LiveAgentViewer: React.FC<LiveAgentViewerProps> = ({ liveViewUrl }) => {
    if (!liveViewUrl) {
        return <p>正在等待实时视图 URL...</p>;
    }

    return (
        <div style={{ width: '100%', height: '600px', border: '1px solid #ccc' }}>
            <BrowserLiveView url={liveViewUrl} />
        </div>
    );
};

// 在您的主 App 组件或页面中:
// const MyPage = () => {
//     const [currentLiveViewUrl, setCurrentLiveViewUrl] = useState<string | null>(null);
//
//     useEffect(() => {
//         // 从您的后端获取 liveViewUrl
//         fetch('/api/start-agent-session')
//             .then(res => res.json())
//             .then(data => setCurrentLiveViewUrl(data.liveViewUrl));
//     }, []);
//
//     return (
//         <div>
//             <h1>AI 代理实时视图</h1>
//             <LiveAgentViewer liveViewUrl={currentLiveViewUrl} />
//         </div>
//     );
// };

仅需 url={liveViewUrl}BrowserLiveView 组件就处理了建立 WebSocket 连接、消费 DCV 流以及在您指定尺寸内渲染实时视频流的复杂细节。这种极简的 JSX 集成极大地简化了前端开发,让您可以专注于围绕实时代理的用户体验。

3. 将 AI 代理连接起来以驱动浏览器

最后一步是将 AI 代理的智能连接到隔离会话中的实际浏览器操作。BrowserLiveView 提供视觉反馈,而您的 AI 代理则使用 Playwright CDP(Chrome DevTools Protocol)以编程方式与浏览器交互。

您的应用程序服务器(也托管您的 AI 代理)将使用 Browser 对象的 page 属性(它是一个 Playwright Page 对象)来执行浏览器操作。

// 示例服务器端代码(接步骤 1)
// 假设您有一个类似 Playwright 的接口或直接使用 Playwright
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

// ... (浏览器创建之前的设置) ...

async function driveAgent(sessionId: string) {
    const browser = new Browser(agentCoreClient, { sessionId }); // 重新连接到现有会话
    await browser.connect(); // 连接到浏览器会话

    const page = browser.page; // 获取 Playwright Page 对象

    // 示例 AI 代理逻辑(为说明目的进行了简化)
    // 在这里,您将与您的 LLM(例如,通过 Bedrock Converse API 的 Anthropic Claude)集成
    // 以根据用户提示和页面内容确定操作。
    console.log("代理正在导航到 example.com...");
    await page.goto('https://www.example.com');
    console.log("代理等待了 3 秒...");
    await page.waitForTimeout(3000); // 模拟处理时间

    console.log("代理正在搜索框中输入(假设)...");
    // 示例:await page.type('#search-input', 'Amazon Bedrock AgentCore');
    // 示例:await page.click('#search-button');

    const content = await page.content();
    // 使用 LLM 分析“content”并决定下一步操作
    const bedrockRuntimeClient = new BedrockRuntimeClient({ region: 'us-east-1' });
    const response = await bedrockRuntimeClient.send(new InvokeModelCommand({
        modelId: "anthropic.claude-3-sonnet-20240229-v1:0", // 或您偏好的模型
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
            messages: [
                {
                    role: "user",
                    content: `分析此网页内容并建议下一步操作: ${content.substring(0, 500)}`
                }
            ],
            max_tokens: 200,
        }),
    }));
    const decodedBody = new TextDecoder("utf-8").decode(response.body);
    const parsedBody = JSON.parse(decodedBody);
    console.log("AI 模型建议的操作:", parsedBody.content[0].text);

    // 根据 LLM 的建议,执行进一步的页面操作...

    // 完成后不要忘记关闭浏览器会话
    // await browser.close();
}

// 在启动会话并获取 URL 后,您将调用 driveAgent(sessionId)

这种交互循环——AI 代理分析页面内容,确定下一步操作,并通过 Playwright CDP 执行——构成了自主浏览代理的核心。所有这些操作都通过用户屏幕上的 BrowserLiveView 组件实时可视化渲染。

通过嵌入式 AI 代理开启新的可能性

Amazon Bedrock AgentCoreBrowserLiveView 集成不仅仅是一项技术功能;它改变了用户与 AI 代理交互和信任 AI 代理的方式。通过嵌入实时视觉反馈,开发者可以创建不仅高效而且透明、可审计且用户友好的 AI 驱动应用程序。

这项功能对于涉及以下方面的应用程序尤其具有变革意义:

  • 复杂工作流程:自动化多步骤在线流程,如数据输入、入职或法规遵从,其中对每个步骤的可见性至关重要。
  • 客户支持:允许代理观察 AI 副驾驶解决客户查询或导航系统,确保准确性并提供干预机会。
  • 培训和调试:为开发者和最终用户提供强大的工具,通过直接观察来理解代理行为、调试问题和训练代理。
  • 增强审计跟踪:生成代理操作的视觉记录,可以与会话录制结合到 Amazon S3 中,以进行全面的事后审查和合规性。

直接将浏览器会话从 AWS 云流式传输到客户端浏览器,绕过应用程序服务器进行视频流传输,在性能和可伸缩性方面提供了显著优势。这种架构最大限度地减少了延迟,并减轻了后端基础设施的负担,使您能够部署高度响应和可伸缩的 AI 代理解决方案。

通过采用 BrowserLiveView,您不仅在构建 AI 代理;您还在构建信任、控制和更丰富的用户体验。探索各种可能性,让您的用户有信心将复杂的网络任务委托给智能代理。

常见问题

What is the Amazon Bedrock AgentCore BrowserLiveView component and how does it function?
The Amazon Bedrock AgentCore BrowserLiveView component is a crucial part of the Bedrock AgentCore TypeScript SDK, designed to embed a real-time video feed of an AI agent's browsing session directly into a React application. It operates by receiving a SigV4-presigned URL from your application server, which then establishes a persistent WebSocket connection to stream video data via the Amazon DCV protocol from an isolated cloud browser session. This direct streaming mechanism ensures low latency and high fidelity, allowing users to observe every action an AI agent takes on a webpage, from navigation to form submissions, without the video stream passing through your server.
How does embedding Live View enhance user trust and confidence in AI agents?
Embedding Live View significantly boosts user trust and confidence by providing unparalleled transparency into an AI agent's operations. Instead of a 'black box' experience, users gain immediate visual confirmation of the agent's actions, observing its progress and interactions in real-time. This visual feedback loop helps users understand that the agent is on the correct path, interacting with the right elements, and progressing as expected. This is particularly valuable for complex or sensitive workflows, where visual evidence can reassure users that the agent is performing its tasks accurately and responsibly, enhancing overall confidence and allowing for timely intervention if necessary.
What are the primary architectural components involved in integrating a Live View AI agent?
The integration of a Live View AI agent involves three main architectural components. First, the user's web browser, running a React application, hosts the BrowserLiveView component, which renders the real-time stream. Second, the application server acts as the orchestrator, managing the AI agent's logic, initiating browser sessions via the Amazon Bedrock AgentCore API, and generating secure, time-limited SigV4-presigned URLs for the Live View stream. Third, the AWS Cloud hosts Amazon Bedrock AgentCore and Bedrock services, providing the isolated cloud browser sessions, automation capabilities (via Playwright CDP), and the DCV-powered Live View streaming endpoint. A key design point is that the DCV stream flows directly from AWS to the user's browser, bypassing the application server for optimal performance.
Can developers utilize any AI model or agent framework with Amazon Bedrock AgentCore's Live View?
Yes, developers have the flexibility to use any AI model or agent framework of their choice with Amazon Bedrock AgentCore's Live View. While the provided example often demonstrates integration with the Amazon Bedrock Converse API and models like Anthropic Claude, the BrowserLiveView component itself is model-agnostic. This means that the real-time visual streaming functionality is decoupled from the AI agent's underlying reasoning and decision-making logic. As long as your chosen AI agent or framework can interact with the browser automation endpoint provided by AgentCore (typically via Playwright CDP), you can leverage Live View to provide visual feedback to your users, making it a highly adaptable solution for various AI-powered applications.
What are the essential prerequisites for setting up a Live View AI browser agent with Amazon Bedrock AgentCore?
To set up a Live View AI browser agent, several prerequisites are necessary. Developers need Node.js version 20 or later for the server-side logic and React for the client-side application. An AWS account in a supported region is required, along with AWS credentials that have the necessary Amazon Bedrock AgentCore Browser permissions. It's crucial to follow the principle of least privilege for IAM permissions and use temporary credentials (e.g., from AWS IAM Identity Center or STS) rather than long-lived access keys for enhanced security. Additionally, the Amazon Bedrock AgentCore TypeScript SDK (`bedrock-agentcore`) and potentially the AWS SDK for JavaScript (`@aws-sdk/client-bedrock-runtime`) if using Bedrock models, must be installed in your project.
How does the DCV protocol facilitate real-time, low-latency video streaming for Live View?
The Amazon DCV (NICE DCV) protocol is instrumental in providing real-time, low-latency video streaming for the BrowserLiveView component. DCV is a high-performance remote display protocol designed to deliver a rich user experience over varying network conditions. In the context of AgentCore, it efficiently encodes and transmits the visual output of the isolated cloud browser session directly to the user's React application via a WebSocket connection. By optimizing data compression and transmission, DCV ensures that the visual feed of the AI agent's actions appears smooth and responsive, minimizing lag and enabling users to observe the agent's behavior as if it were happening locally on their machine, without the need for complex streaming infrastructure setup by the developer.

保持更新

将最新AI新闻发送到您的收件箱。

分享