简体中文

快速开始

我们用这个需求来举例：使用 OpenAI GPT-4o 在 eBay 上搜索 "耳机"，并以 JSON 格式返回商品和价格结果。

在运行该示例之前，请确保您已经准备了能够调用 OpenAI GPT-4o 模型的 API key。

Puppeteer 是一个 Node.js 库，它通过 DevTools Protocol 或 WebDriver BiDi 提供了用于控制 Chrome 或 Firefox 的高级 API。默认情况下，Puppeteer 运行在无头模式（headless mode, 即没有可见的 UI），但也可以配置为在有头模式（headed mode, 即有可见的浏览器界面）下运行。

准备工作

配置 OpenAI API Key，或自定义模型服务

# 更新为你自己的 Key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

使用命令行版本体验

你可以快速使用命令行版本的 Midscene 来体验它的基础能力。

请确保你已安装 Node.js。

# headless mode to visit bing.com and search for 'weather today'
npx @midscene/cli --url https://wwww.bing.com --action "type 'weather today', hit enter"

# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter"

# visit github status page and save the status to ./status.json
npx @midscene/cli \
  --url https://www.githubstatus.com/ \
  --query-output status.json \
  --query '{name: string, status: string}[], service status of github page'

如果你想更深入地了解 Midscene，我们建议使用 SDK 版本，并将其与 Playwright 或 Puppeteer 集成。

查看运行报告

运行 Midscene 之后，系统会生成一个日志文件，默认存放在 ./midscene_run/report/latest.web-dump.json。然后，你可以把这个文件导入可视化工具，这样你就能更清楚地了解整个过程。

集成到 Playwright

Playwright.js 是由微软开发的一个开源自动化库，主要用于对网络应用程序进行端到端测试（end-to-end test）和网页抓取。

这里我们假设你已经拥有一个集成了 Playwright 的仓库。

新增依赖

npm

yarn

pnpm

bun

npm install @midscene/web --save-dev

第一步：更新 playwright.config.ts

export default defineConfig({
  testDir: './e2e',
+ timeout: 90 * 1000,
+ reporter: [["list"], ["@midscene/web/playwright-report"]],
});

第二步：扩展 `test` 实例

把下方代码保存为 ./fixture.ts;

import { test as base } from '@playwright/test';
import type { PlayWrightAiFixtureType } from '@midscene/web/playwright';
import { PlaywrightAiFixture } from '@midscene/web/playwright';

export const test = base.extend<PlayWrightAiFixtureType>(PlaywrightAiFixture());

第三步：编写测试用例

编写下方代码，保存为 ./e2e/ebay-search.spec.ts

./e2e/ebay-search.spec.ts

import { expect } from "@playwright/test";
import { test } from "./fixture";

test.beforeEach(async ({ page }) => {
  page.setViewportSize({ width: 400, height: 905 });
  await page.goto("https://www.ebay.com");
  await page.waitForLoadState("networkidle");
});

test("search headphone on ebay", async ({ ai, aiQuery, aiAssert }) => {
  // 👀 输入关键字，执行搜索
  // 注：尽管这是一个英文页面，你也可以用中文指令控制它
  await ai('在搜索框输入 "Headphones" ，敲回车');

  // 👀 找到列表里耳机相关的信息
  const items = await aiQuery(
    '{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格'
  );

  console.log("headphones in stock", items);
  expect(items?.length).toBeGreaterThan(0);

  // 👀 用 AI 断言
  await aiAssert("界面左侧有类目筛选功能");
});

Step 4. 运行测试用例

npx playwright test ./e2e/ebay-search.spec.ts

Step 5. 查看测试报告

当上面的命令执行成功后，会在控制台输出：Midscene - report file updated: ./current_cwd/midscene_run/report/some_id.html，通过浏览器打开该文件即可看到报告。

集成到 Puppeteer

Puppeteer 是一个 Node.js 库，它通过 DevTools 协议或 WebDriver BiDi 提供控制 Chrome 或 Firefox 的高级 API。Puppeteer 默认在无界面模式（headless）下运行，但可以配置为在可见的浏览器模式（headed）中运行。

第一步：安装依赖

npm install @midscene/web --save-dev
npm install puppeteer ts-node --save-dev

第二步：编写脚本

编写下方代码，保存为 ./demo.ts

./demo.ts

import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const browser = await puppeteer.launch({
      headless: false, // here we use headed mode to help debug
    });

    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 800,
      deviceScaleFactor: 1,
    });

    await page.goto("https://www.ebay.com");
    await sleep(5000);

    // 👀 初始化 Midscene agent 
    const mid = new PuppeteerAgent(page);

    // 👀 执行搜索
    // 注：尽管这是一个英文页面，你也可以用中文指令控制它
    await mid.aiAction('在搜索框输入 "Headphones" ，敲回车');
    await sleep(5000);

    // 👀 理解页面，提取数据
    const items = await mid.aiQuery(
      '{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格',
    );
    console.log("耳机商品信息", items);

    // 👀 用 AI 断言
    await mid.aiAssert("界面左侧有类目筛选功能");

    await browser.close();
  })()
);

TIP

你可能已经注意到了，上述文件中的关键代码只有三行，且都是用自然语言编写的

await mid.aiAction('在搜索框输入 "Headphones" ，敲回车');
await mid.aiQuery(
  '{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格',
);
await mid.aiAssert("界面左侧有类目筛选功能");

第三步：运行

使用 ts-node 来运行，你会看到命令行打印出了耳机的商品信息：

# run
npx ts-node demo.ts

# 命令行应该有如下输出
#  [
#   {
#     itemTitle: 'JBL Tour Pro 2 - True wireless Noise Cancelling earbuds with Smart Charging Case',
#     price: 551.21
#   },
#   {
#     itemTitle: 'Soundcore Space One无线耳机40H ANC播放时间2XStronger语音还原',
#     price: 543.94
#   }
# ]

第四步：查看运行报告

当上面的命令执行成功后，会在控制台输出：Midscene - report file updated: ./current_cwd/midscene_run/report/some_id.html，通过浏览器打开该文件即可看到报告。

你也可以将 ./midscene_run/report/latest.web-dump.json 文件导入可视化工具查看。

查看示例报告

在可视化工具中点击"加载演示"按钮，你将能够看到之前代码的运行结果以及其他一些示例。

大纲

快速开始#

准备工作#

使用命令行版本体验#

查看运行报告#

集成到 Playwright#

新增依赖#

第一步：更新 playwright.config.ts#

第二步：扩展 test 实例#

第三步：编写测试用例#

Step 4. 运行测试用例#

Step 5. 查看测试报告#

集成到 Puppeteer#

第一步：安装依赖#

第二步：编写脚本#

第三步：运行#

第四步：查看运行报告#

查看示例报告#