集成到 Puppeteer

Puppeteer 是一个 Node.js 库，它通过 DevTools 协议或 WebDriver BiDi 提供控制 Chrome 或 Firefox 的高级 API。Puppeteer 默认在无界面模式（headless）下运行，但可以配置为在可见的浏览器模式（headed）中运行。

样例项目

你可以在这里看到向 Puppeteer 集成的样例项目：https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo

这里还有一个 Playwright 和 Vitest 结合的样例项目：https://github.com/web-infra-dev/midscene-example/tree/main/playwright-with-vitest-demo

配置 AI 模型服务

将你的模型配置写入环境变量，可参考模型策略了解更多细节。

export MIDSCENE_MODEL_BASE_URL="https://替换为你的模型服务地址/v1"
export MIDSCENE_MODEL_API_KEY="替换为你的 API Key"
export MIDSCENE_MODEL_NAME="替换为你的模型名称"
export MIDSCENE_MODEL_FAMILY="替换为你的模型系列"

更多配置信息请参考模型策略和模型配置。

集成 Midscene Agent

第一步：安装依赖

npm

yarn

pnpm

bun

deno

npm install @midscene/web puppeteer tsx --save-dev

第二步：编写脚本

编写下方代码，保存为 ./demo.ts

./demo.ts

import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const browser = await puppeteer.launch({
      headless: false, // here we use headed mode to help debug
    });

    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 800,
      deviceScaleFactor: 1,
    });

    await page.goto("https://www.ebay.com");
    await sleep(5000);

    // 👀 初始化 Midscene agent 
    const agent = new PuppeteerAgent(page);

    // 👀 执行搜索
    // 注：尽管这是一个英文页面，你也可以用中文指令控制它
    await agent.aiAct('在搜索框输入 "Headphones"，敲回车');
    await sleep(5000);

    // 👀 理解页面，提取数据
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格',
    );
    console.log("耳机商品信息", items);

    // 👀 用 AI 断言
    await agent.aiAssert("界面左侧有类目筛选功能");

    await browser.close();
  })()
);

第三步：运行

使用 tsx 来运行，你会看到命令行打印出了耳机的商品信息：

# run
npx tsx demo.ts

# 命令行应该有如下输出
#  [
#   {
#     itemTitle: 'Beats by Dr. Dre Studio Buds Totally Wireless Noise Cancelling In Ear + OPEN BOX',
#     price: 505.15
#   },
#   {
#     itemTitle: 'Skullcandy Indy Truly Wireless Earbuds-Headphones Green Mint',
#     price: 186.69
#   }
# ]

更多 Agent 的 API 讲解请参考 API 参考。

第四步：查看运行报告

当上面的命令执行成功后，会在控制台输出：Midscene - report file updated: /path/to/report/some_id.html，通过浏览器打开该文件即可看到报告。

Advanced

关于在新标签页打开

每个 Agent 实例都与对应的页面唯一绑定，为了方便开发者调试，Midscene 默认拦截了新 tab 的页面（如点击一个带有 target="_blank" 属性的链接），将其改为在当前页面打开。

如果你想恢复在新标签页打开的行为，你可以设置 forceSameTabNavigation 选项为 false，但相应的，你需要为新标签页创建一个 Agent 实例。

const mid = new PuppeteerAgent(page, {
  forceSameTabNavigation: false,
});

连接远程 Puppeteer 浏览器并接入 Midscene Agent

示例项目

你可以在这里找到远程 Puppeteer 集成的示例项目：https://github.com/web-infra-dev/midscene-example/tree/main/remote-puppeteer-demo

当你想复用已有的远程浏览器（例如云端常驻的 worker、第三方浏览器网格或本地内网桌面）时，可以通过此流程把 Midscene 接到远程 Puppeteer 实例上。这样做能让浏览器靠近目标环境、降低重复启动成本，并统一管理浏览器资源，同时保持一致的 AI 自动化能力。

实践中你需要手动：

从远程浏览器服务获取 CDP WebSocket URL
使用 Puppeteer 连接到远程浏览器
创建 Midscene Agent 进行 AI 驱动的自动化

前置依赖

npm

yarn

pnpm

bun

deno

npm install puppeteer @midscene/web --save-dev

获取 CDP WebSocket URL

你可以从多种来源获取 CDP WebSocket URL：

BrowserBase：在 https://browserbase.com 注册并获取你的 CDP URL
Browserless：使用 https://browserless.io 或运行你自己的实例
本地 Chrome：使用 --remote-debugging-port=9222 参数运行 Chrome，然后使用 ws://localhost:9222/devtools/browser/...
Docker：在 Docker 容器中运行 Chrome 并暴露调试端口

基础示例

import puppeteer from 'puppeteer';
import { PuppeteerAgent } from '@midscene/web/puppeteer';

// 假设你已经有了一个 CDP WebSocket URL
const cdpWsUrl = 'ws://your-remote-browser.com/devtools/browser/your-session-id';

// 连接到远程浏览器
const browser = await puppeteer.connect({
  browserWSEndpoint: cdpWsUrl
});

// 获取或创建页面
const pages = await browser.pages();
const page = pages[0] || await browser.newPage();

// 创建 Midscene Agent
const agent = new PuppeteerAgent(page);

// 使用 AI 方法
await agent.aiAction('跳转到 https://example.com');
await agent.aiAction('点击登录按钮');
const result = await agent.aiQuery('获取页面标题: {title: string}');

// 清理
await agent.destroy();
await browser.disconnect();

提供自定义动作

可以使用 customActions 选项，通过 defineAction 来扩展 Agent 的动作空间。传入该选项后，这些动作会追加到内置动作中，Agent 在规划（Planning）时就可以调用它们。

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // 在这里结合 locate + count 实现自定义点击逻辑
  },
});

const agent = new PuppeteerAgent(page, {
  customActions: [ContinuousClick],
});

await agent.aiAct('点击红色按钮五次');

更多关于自定义动作的细节，请参考集成到任意界面。

FAQ

浏览器界面持续闪动

通常是 viewport 的 deviceScaleFactor 与系统像素比不匹配所致。将 deviceScaleFactor 设为 0 可自动适配：

await page.setViewport({
  deviceScaleFactor: 0,
});

更多详情请参考 Playwright FAQ — 浏览器界面持续闪动。

自定义网络超时

Midscene 在执行操作后会自动等待网络空闲，你可以自定义或关闭超时时间——详见 Playwright FAQ — 自定义网络超时。

更多 Agent 的 API 文档请参考 API 参考。
Puppeteer 的 API 文档请参考 Puppeteer Agent API。
样例项目
- Puppeteer：https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo
- Playwright + Vitest：https://github.com/web-infra-dev/midscene-example/tree/main/playwright-with-vitest-demo

#集成到 Puppeteer

#配置 AI 模型服务

#集成 Midscene Agent

#第一步：安装依赖

#第二步：编写脚本

#第三步：运行

#第四步：查看运行报告

#Advanced

#关于在新标签页打开

#连接远程 Puppeteer 浏览器并接入 Midscene Agent

#前置依赖

#获取 CDP WebSocket URL

#基础示例

#提供自定义动作

#FAQ

#浏览器界面持续闪动

#自定义网络超时

#更多