Bedrock AgentCore: Integroi reaaliaikainen tekoälyselainagentti Reactiin

Jos käytät AWS Bedrockia tekoälymallillesi, asenna AWS SDK for JavaScript:
```
npm install @aws-sdk/client-bedrock-runtime
```

Live View'n toteutuksen koodikanta on tyypillisesti jaettu: palvelinpuolen koodi (istunnon hallintaan ja tekoälyagentin logiikkaan) suoritetaan Node.js:ssä, ja asiakaspuolen koodi (Live View'n renderöintiin) suoritetaan React-sovelluksessa, usein niputettuna Vite-kaltaisten työkalujen kanssa.

Vaiheittainen integrointi: istunnosta streamaukseen

Live-tekoälyselainagentin integrointi Amazon Bedrock AgentCoren kanssa sisältää selkeän, kolmivaiheisen prosessin, joka yhdistää palvelinpuolen logiikan asiakaspuolen React-sovellukseesi ja AWS Cloudin vankkoihin ominaisuuksiin.

1. Selainistunnon aloittaminen ja Live View -URL-osoitteen luominen

Ensimmäinen vaihe tapahtuu sovelluspalvelimellasi. Tässä taustajärjestelmälogiikkasi käynnistää selainistunnon Amazon Bedrock AgentCoren sisällä ja hankkii turvallisesti tarvittavan URL-osoitteen reaaliaikaisen näkymän suoratoistoon.

Käytät Browser-luokkaa bedrock-agentcore SDK:sta. Tämä luokka hoitaa pilven eristettyjen selainympäristöjen luomisen ja hallinnan monimutkaisuuden. Tämän vaiheen keskeinen tulos on SigV4-esivaltuutettu URL-osoite, joka antaa turvallisen, väliaikaisen pääsyn selainistunnon reaaliaikaiseen videovirtaan.

// Esimerkkipalvelinpuolen koodi (Node.js)
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';

// Alusta Bedrock AgentCore -asiakasohjelma (varmista, että oikeat AWS-tunnistetiedot on määritetty)
const agentCoreClient = new AgentCoreClient({ region: 'us-east-1' }); // Käytä haluamaasi aluetta

async function startLiveSession() {
    // Luo uusi selainistunto
    const browser = new Browser(agentCoreClient);
    await browser.create();

    // Luo Live View -URL-osoite
    const liveViewUrl = await browser.getLiveViewURL();
    console.log('Live View URL:', liveViewUrl);

    // Tallenna browser.sessionId, jotta voit myöhemmin yhdistää tekoälyagenttisi tai lopettaa istunnon
    const sessionId = browser.sessionId;
    
    return { liveViewUrl, sessionId };
}

// Tämä `liveViewUrl` lähetetään React-asiakasohjelmallesi.

Tämä URL-osoite lähetetään sitten React-frontendillesi, joka käyttää sitä live-streamin luomiseen.

2. Live View'n renderöinti React-sovelluksessasi

Kun React-sovelluksesi vastaanottaa liveViewUrl-osoitteen palvelimeltasi, reaaliaikaisen suoratoiston renderöinti on huomattavan suoraviivaista BrowserLiveView-komponentin ansiosta.

// Esimerkki asiakaspuolen koodista (React-komponentti)
import React, { useEffect, useState } from 'react';
import { BrowserLiveView } from 'bedrock-agentcore';

interface LiveAgentViewerProps {
    liveViewUrl: string;
}

const LiveAgentViewer: React.FC<LiveAgentViewerProps> = ({ liveViewUrl }) => {
    if (!liveViewUrl) {
        return <p>Waiting for Live View URL...</p>;
    }

    return (
        <div style={{ width: '100%', height: '600px', border: '1px solid #ccc' }}>
            <BrowserLiveView url={liveViewUrl} />
        </div>
    );
};

// Pääsovelluskomponentissasi tai -sivullasi:
// const MyPage = () => {
//     const [currentLiveViewUrl, setCurrentLiveViewUrl] = useState<string | null>(null);
//
//     useEffect(() => {
//         // Hae liveViewUrl taustajärjestelmästäsi
//         fetch('/api/start-agent-session')
//             .then(res => res.json())
//             .then(data => setCurrentLiveViewUrl(data.liveViewUrl));
//     }, []);
//
//     return (
//         <div>
//             <h1>AI Agent Live View</h1>
//             <LiveAgentViewer liveViewUrl={currentLiveViewUrl} />
//         </div>
//     );
// };

Pelkällä url={liveViewUrl}-määritteellä BrowserLiveView-komponentti hoitaa WebSocket-yhteyden muodostamisen, DCV-virran kuluttamisen ja reaaliaikaisen videokuvan renderöinnin määrittämillesi ulottuvuuksille. Tämä minimaalinen JSX-integrointi yksinkertaistaa merkittävästi frontend-kehitystä, jolloin voit keskittyä käyttäjäkokemukseen live-agentin ympärillä.

3. Tekoälyagentin kytkeminen ohjaamaan selainta

Viimeinen vaihe yhdistää tekoälyagenttisi älykkyyden todellisiin selaintoimintoihin eristetyssä istunnossa. Kun BrowserLiveView tarjoaa visuaalisen palautteen, tekoälyagenttisi käyttää Playwright CDP:tä (Chrome DevTools Protocol) vuorovaikutukseen selaimen kanssa ohjelmallisesti.

Sovelluspalvelimesi, joka myös isännöi tekoälyagenttiasi, käyttää Browser-objektin page-ominaisuutta (joka on Playwright Page-objekti) selaintoimintojen suorittamiseen.

// Esimerkkipalvelinpuolen koodi (jatkuu vaiheesta 1)
// Olettaen, että sinulla on Playwright-tyyppinen käyttöliittymä tai suora Playwright-käyttö
import { Browser } from 'bedrock-agentcore';
import { AgentCoreClient } from '@aws-sdk/client-bedrock-agentcore';
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

// ... (selaimen luomisen aiempi asetukset) ...

async function driveAgent(sessionId: string) {
    const browser = new Browser(agentCoreClient, { sessionId }); // Yhdistä uudelleen olemassa olevaan istuntoon
    await browser.connect(); // Yhdistä selainistuntoon

    const page = browser.page; // Hae Playwright Page -objekti

    // Esimerkki tekoälyagentin logiikasta (yksinkertaistettu esimerkin vuoksi)
    // Tässä integroisit LLM:ääsi (esim. Anthropic Clauden Bedrock Converse API:n kautta)
    // määrittääksesi toimet käyttäjän kehotteiden ja sivun sisällön perusteella.
    console.log("Agentti siirtyy osoitteeseen example.com...");
    await page.goto('https://www.example.com');
    console.log("Agentti odotti 3 sekuntia...");
    await page.waitForTimeout(3000); // Simuloi käsittelyaikaa

    console.log("Agentti kirjoittaa hakukenttään (hypoteettinen)...");
    // Esimerkki: await page.type('#search-input', 'Amazon Bedrock AgentCore');
    // Esimerkki: await page.click('#search-button');

    const content = await page.content();
    // Käytä LLM:ää sisällön analysointiin ja seuraavien vaiheiden päättämiseen
    const bedrockRuntimeClient = new BedrockRuntimeClient({ region: 'us-east-1' });
    const response = await bedrockRuntimeClient.send(new InvokeModelCommand({
        modelId: "anthropic.claude-3-sonnet-20240229-v1:0", // tai haluamasi malli
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
            messages: [
                {
                    role: "user",
                    content: `Analyze this webpage content and suggest the next action: ${content.substring(0, 500)}`
                }
            ],
            max_tokens: 200,
        }),
    }));
    const decodedBody = new TextDecoder("utf-8").decode(response.body);
    const parsedBody = JSON.parse(decodedBody);
    console.log("Tekoälymallin ehdottama toimenpide:", parsedBody.content[0].text);

    // LLM:n ehdotuksen perusteella suorita lisää sivutoimintoja...

    // Älä unohda sulkea selainistuntoa, kun olet valmis
    // await browser.close();
}

// Istunnon aloittamisen ja URL-osoitteen saamisen jälkeen kutsuisit driveAgent(sessionId)

Tämä vuorovaikutussilmukka – jossa tekoälyagenttisi analysoi sivun sisältöä, määrittää seuraavan toimen ja suorittaa sen Playwright CDP:n kautta – muodostaa autonomisen selainagentin ytimen. Kaikki nämä toimet renderöidään visuaalisesti reaaliaikaisesti BrowserLiveView-komponentin kautta käyttäjän näytöllä.

Uusien mahdollisuuksien avaaminen upotetuilla tekoälyagenteilla

Amazon Bedrock AgentCoren BrowserLiveView-komponentin integrointi on enemmän kuin vain tekninen ominaisuus; se on paradigman muutos siinä, miten käyttäjät ovat vuorovaikutuksessa tekoälyagenttien kanssa ja luottavat niihin. Upottamalla reaaliaikaisen visuaalisen palautteen kehittäjät voivat luoda tekoälypohjaisia sovelluksia, jotka eivät ole ainoastaan tehokkaita, vaan myös läpinäkyviä, auditoitavissa ja käyttäjäystävällisiä.

Tämä ominaisuus on erityisen mullistava sovelluksille, jotka sisältävät:

Monimutkaiset työnkulut: Monivaiheisten online-prosessien, kuten tiedonsyötön, perehdytyksen tai säännösten noudattamisen, automatisointi, jossa jokaisen vaiheen näkyvyys on ensiarvoisen tärkeää.
Asiakastuki: Antaa agenteille mahdollisuuden tarkkailla tekoälyn apuohjaajia ratkaisemassa asiakaskyselyitä tai navigoimassa järjestelmissä, varmistaen tarkkuuden ja tarjoamalla mahdollisuuksia puuttumiseen.
Koulutus ja virheenkorjaus: Tarjoaa kehittäjille ja loppukäyttäjille tehokkaan työkalun agentin käyttäytymisen ymmärtämiseen, ongelmien virheenkorjaukseen ja agenttien kouluttamiseen suoran tarkkailun avulla.
Parannetut tarkastusketjut: Luo visuaalisia tallenteita agentin toimista, jotka voidaan yhdistää istuntotallenteisiin Amazon S3:een kattavaa jälkikäteisarviointia ja vaatimustenmukaisuutta varten.

Kyky suoratoistaa selainistuntoja suoraan AWS Cloudista asiakasselaimiin, ohittaen sovelluspalvelimen videovirran osalta, tarjoaa merkittäviä etuja suorituskyvyn ja skaalautuvuuden kannalta. Tämä arkkitehtuuri minimoi viiveen ja vähentää taustajärjestelmäinfrastruktuurisi kuormitusta, mahdollistaen erittäin reagoivien ja skaalautuvien tekoälyagenttiratkaisujen käyttöönoton.

Ottamalla käyttöön BrowserLiveView'n et rakenna pelkästään tekoälyagentteja; rakennat luottamusta, hallintaa ja rikkaampaa käyttäjäkokemusta. Tutustu mahdollisuuksiin ja anna käyttäjillesi itseluottamusta delegoida monimutkaiset verkkotehtävät älykkäille agenteille.

Alkuperäinen lähde

https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/

Usein kysytyt kysymykset

What is the Amazon Bedrock AgentCore BrowserLiveView component and how does it function?

The Amazon Bedrock AgentCore BrowserLiveView component is a crucial part of the Bedrock AgentCore TypeScript SDK, designed to embed a real-time video feed of an AI agent's browsing session directly into a React application. It operates by receiving a SigV4-presigned URL from your application server, which then establishes a persistent WebSocket connection to stream video data via the Amazon DCV protocol from an isolated cloud browser session. This direct streaming mechanism ensures low latency and high fidelity, allowing users to observe every action an AI agent takes on a webpage, from navigation to form submissions, without the video stream passing through your server.

How does embedding Live View enhance user trust and confidence in AI agents?

Embedding Live View significantly boosts user trust and confidence by providing unparalleled transparency into an AI agent's operations. Instead of a 'black box' experience, users gain immediate visual confirmation of the agent's actions, observing its progress and interactions in real-time. This visual feedback loop helps users understand that the agent is on the correct path, interacting with the right elements, and progressing as expected. This is particularly valuable for complex or sensitive workflows, where visual evidence can reassure users that the agent is performing its tasks accurately and responsibly, enhancing overall confidence and allowing for timely intervention if necessary.

What are the primary architectural components involved in integrating a Live View AI agent?

The integration of a Live View AI agent involves three main architectural components. First, the user's web browser, running a React application, hosts the BrowserLiveView component, which renders the real-time stream. Second, the application server acts as the orchestrator, managing the AI agent's logic, initiating browser sessions via the Amazon Bedrock AgentCore API, and generating secure, time-limited SigV4-presigned URLs for the Live View stream. Third, the AWS Cloud hosts Amazon Bedrock AgentCore and Bedrock services, providing the isolated cloud browser sessions, automation capabilities (via Playwright CDP), and the DCV-powered Live View streaming endpoint. A key design point is that the DCV stream flows directly from AWS to the user's browser, bypassing the application server for optimal performance.

Can developers utilize any AI model or agent framework with Amazon Bedrock AgentCore's Live View?

Yes, developers have the flexibility to use any AI model or agent framework of their choice with Amazon Bedrock AgentCore's Live View. While the provided example often demonstrates integration with the Amazon Bedrock Converse API and models like Anthropic Claude, the BrowserLiveView component itself is model-agnostic. This means that the real-time visual streaming functionality is decoupled from the AI agent's underlying reasoning and decision-making logic. As long as your chosen AI agent or framework can interact with the browser automation endpoint provided by AgentCore (typically via Playwright CDP), you can leverage Live View to provide visual feedback to your users, making it a highly adaptable solution for various AI-powered applications.

What are the essential prerequisites for setting up a Live View AI browser agent with Amazon Bedrock AgentCore?

To set up a Live View AI browser agent, several prerequisites are necessary. Developers need Node.js version 20 or later for the server-side logic and React for the client-side application. An AWS account in a supported region is required, along with AWS credentials that have the necessary Amazon Bedrock AgentCore Browser permissions. It's crucial to follow the principle of least privilege for IAM permissions and use temporary credentials (e.g., from AWS IAM Identity Center or STS) rather than long-lived access keys for enhanced security. Additionally, the Amazon Bedrock AgentCore TypeScript SDK (`bedrock-agentcore`) and potentially the AWS SDK for JavaScript (`@aws-sdk/client-bedrock-runtime`) if using Bedrock models, must be installed in your project.

How does the DCV protocol facilitate real-time, low-latency video streaming for Live View?

The Amazon DCV (NICE DCV) protocol is instrumental in providing real-time, low-latency video streaming for the BrowserLiveView component. DCV is a high-performance remote display protocol designed to deliver a rich user experience over varying network conditions. In the context of AgentCore, it efficiently encodes and transmits the visual output of the isolated cloud browser session directly to the user's React application via a WebSocket connection. By optimizing data compression and transmission, DCV ensures that the visual feed of the AI agent's actions appears smooth and responsive, minimizing lag and enabling users to observe the agent's behavior as if it were happening locally on their machine, without the need for complex streaming infrastructure setup by the developer.

Pysy ajan tasalla

Saa uusimmat tekoälyuutiset sähköpostiisi.

Jaa