Skip to content

feat: connect to remote browser services#3545

Open
l2ysho wants to merge 51 commits into
v4from
1822-connect-to-remote-browser-services
Open

feat: connect to remote browser services#3545
l2ysho wants to merge 51 commits into
v4from
1822-connect-to-remote-browser-services

Conversation

@l2ysho

@l2ysho l2ysho commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

closes #1822

l2ysho and others added 15 commits March 18, 2026 14:46
… support

# Task 1: Type Definitions & LaunchContext `isRemote` Flag

## Goal

Add the foundational types and the `isRemote` flag that all other remote browser tasks depend on.

## Dependencies

None — this is the foundation task.

## Scope

### 1. Add `isRemote` to `LaunchContext`

**File:** `packages/browser-pool/src/launch-context.ts`

- Add `isRemote?: boolean` to the `LaunchContextOptions` interface (alongside `id`, `browserPlugin`, etc.)
- Add a public readonly `isRemote: boolean` property to the `LaunchContext` class
- Set it from constructor options, defaulting to `false`

### 2. Define connect option types on PlaywrightPlugin

**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`

Add the following type to the plugin file (or a co-located types file):

```typescript
// Mirrors browserType.connectOverCDP(endpointURL, options)
interface PlaywrightConnectOverCDPOptions {
    endpointURL: string;
    options?: Parameters<BrowserType['connectOverCDP']>[1];
}

// Mirrors browserType.connect(wsEndpoint, options)
interface PlaywrightConnectOptions {
    wsEndpoint: string;
    options?: Parameters<BrowserType['connect']>[1];
}
```

Use the existing `Parameters` utility type pattern (see how `SafeParameters` is used elsewhere in the codebase) — do NOT redefine Playwright's types manually.

### 3. Define connect option types on PuppeteerPlugin

**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`

```typescript
// Mirrors puppeteer.connect({ browserWSEndpoint, ...rest })
// Flat object matching Puppeteer's ConnectOptions
type PuppeteerConnectOverCDPOptions = Parameters<typeof puppeteer.connect>[0];
```

Use the `Parameters` pattern to extract the type from Puppeteer's `connect` method.

### 4. Add connect option fields to `BrowserPluginOptions`

**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`

This is a design choice — the PRD says connect options live on the plugin subclass, not on `LaunchContext`. Add the fields to the plugin options type so they flow through the constructor:

- `PlaywrightPlugin` options should accept `connectOptions?` and `connectOverCDPOptions?`
- `PuppeteerPlugin` options should accept `connectOverCDPOptions?`

These can be added to subclass-specific option types rather than the base `BrowserPluginOptions`.

### 5. Add connect option fields to launcher-level interfaces

**File:** `packages/playwright-crawler/src/internals/playwright-launcher.ts`

Add to `PlaywrightLaunchContext`:
```typescript
connectOptions?: PlaywrightConnectOptions;
connectOverCDPOptions?: PlaywrightConnectOverCDPOptions;
```

**File:** `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts`

Add to `PuppeteerLaunchContext`:
```typescript
connectOverCDPOptions?: PuppeteerConnectOverCDPOptions;
```

This enables IDE autocomplete when users configure `launchContext` on the crawler.

### 6. Export new types

**File:** `packages/browser-pool/src/index.ts`

Export the new connect option types so they're available to consumers.

## Key Files

| File | Change |
|------|--------|
| `packages/browser-pool/src/launch-context.ts` | Add `isRemote` option + property |
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Add connect option types |
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Add connect option type |
| `packages/playwright-crawler/src/internals/playwright-launcher.ts` | Add connect options to `PlaywrightLaunchContext` |
| `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts` | Add connect options to `PuppeteerLaunchContext` |
| `packages/browser-pool/src/index.ts` | Export new types |
| `packages/browser-crawler/src/internals/browser-launcher.ts` | May need connect options on `BrowserLaunchContext` base |

## Acceptance Criteria

- [x] `LaunchContext` has `isRemote` boolean property, defaults to `false`
- [x] Connect option types are defined using library `Parameters` extraction (not manual redefinition)
- [x] `PlaywrightLaunchContext` shows `connectOptions` and `connectOverCDPOptions` in IDE autocomplete
- [x] `PuppeteerLaunchContext` shows `connectOverCDPOptions` in IDE autocomplete
- [x] New types are exported from `@crawlee/browser-pool`
- [x] TypeScript compiles with no errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and `connectOverCDP()`

# Task 2: PlaywrightPlugin Remote Connection Routing

## Goal

Make `PlaywrightPlugin._launch()` branch to `connect()` or `connectOverCDP()` when remote connection options are present, instead of calling `launch()`.

## Dependencies

- Task 1 (types and `isRemote` flag)

## Scope

### 1. Store connect options on the plugin instance

**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`

- Accept `connectOptions` and `connectOverCDPOptions` in the constructor options
- Store them as instance properties
- **Validation:** If both `connectOptions` AND `connectOverCDPOptions` are provided, throw an error immediately in the constructor:
  ```
  Cannot set both 'connectOptions' and 'connectOverCDPOptions' — pick one protocol.
  ```

### 2. Branch in `_launch()` for remote connections

**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`

In the existing `_launch()` method (currently lines 22-102), add branching logic **before** the existing local launch code:

```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
    // Remote CDP connection
    if (this.connectOverCDPOptions) {
        const { endpointURL, options } = this.connectOverCDPOptions;
        const browser = await browserType.connectOverCDP(endpointURL, options);
        return browser;
    }

    // Remote Playwright WebSocket connection
    if (this.connectOptions) {
        const { wsEndpoint, options } = this.connectOptions;
        const browser = await browserType.connect(wsEndpoint, options);
        return browser;
    }

    // Existing local launch logic...
}
```

**Reference:** See `StagehandPlugin._launch()` at `packages/stagehand-crawler/src/internals/stagehand-plugin.ts:102-107` for the CDP connection pattern:
```typescript
const cdpUrl = await stagehand.connectURL();
const browser = await chromium.connectOverCDP(cdpUrl);
```

### 3. Set `isRemote` on LaunchContext

**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`

In `createLaunchContext()` (or wherever the plugin creates the LaunchContext), pass `isRemote: true` when connect options are present. This can be done by overriding `createLaunchContext()` in the subclass, or by passing it through the options.

Check how the base `BrowserPlugin.createLaunchContext()` works (at `packages/browser-pool/src/abstract-classes/browser-plugin.ts:149-174`) and determine the best insertion point.

## Key Design Decisions

- **No new abstract method:** The routing happens inside `_launch()` via internal branching, not a new `_connect()` method. This keeps the abstract interface unchanged and doesn't affect custom plugins like StagehandPlugin.
- **`browser.close()` for cleanup:** Remote browsers are closed the same way as local browsers — via `browser.close()`. No special disconnect handling.
- **No proxy server setup for remote:** The remote branch skips the local proxy server setup that exists in the current `_launch()` code.

## Key Files

| File | Change |
|------|--------|
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Constructor stores options, `_launch()` branches for remote |

## Acceptance Criteria

- [x] `PlaywrightPlugin` accepts `connectOptions` in constructor and calls `browserType.connect()` with `wsEndpoint` and `options`
- [x] `PlaywrightPlugin` accepts `connectOverCDPOptions` in constructor and calls `browserType.connectOverCDP()` with `endpointURL` and `options`
- [x] Setting both `connectOptions` and `connectOverCDPOptions` throws an error
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips local proxy server setup and persistent context logic
- [x] TypeScript compiles with no errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nnect()`

# Task 3: PuppeteerPlugin Remote Connection Routing

## Goal

Make `PuppeteerPlugin._launch()` branch to `puppeteer.connect()` when remote connection options (CDP) are present, instead of calling `puppeteer.launch()`.

## Dependencies

- Task 1 (types and `isRemote` flag)

## Scope

### 1. Store connect options on the plugin instance

**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`

- Accept `connectOverCDPOptions` in the constructor options
- Store as an instance property
- Puppeteer only supports CDP — there is no `connectOptions` field (Playwright-only)

### 2. Branch in `_launch()` for remote connections

**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`

In the existing `_launch()` method (currently lines 22-203), add branching logic **before** the existing local launch code:

```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
    // Remote CDP connection
    if (this.connectOverCDPOptions) {
        const browser = await puppeteer.connect(this.connectOverCDPOptions);
        // Wrap with the same Proxy handler for newPage() interception
        // (see existing code at lines 138-200)
        return wrappedBrowser;
    }

    // Existing local launch logic...
}
```

**Important:** Puppeteer's `connect()` takes a flat options object: `puppeteer.connect({ browserWSEndpoint, ...rest })`. This is different from Playwright's two-argument pattern. The type should match Puppeteer's `ConnectOptions`.

### 3. Handle the `newPage()` Proxy wrapper for remote

The existing `_launch()` wraps the browser in a `Proxy` that intercepts `newPage()` calls to support `useIncognitoPages` (lines 138-200). This proxy wrapper should also be applied to remote browsers so that incognito context creation works correctly.

### 4. Set `isRemote` on LaunchContext

Same pattern as Task 2 — pass `isRemote: true` when `connectOverCDPOptions` is present.

## Key Design Decisions

- **Flat options object:** Puppeteer's `connect()` API takes a single options object (not `endpointURL, options` like Playwright). The `connectOverCDPOptions` type matches this flat shape directly.
- **`browser.close()` for cleanup:** Same as Playwright — remote browsers closed via `browser.close()`, not `browser.disconnect()`.
- **`newPage()` proxy still needed:** The Proxy wrapper that intercepts `newPage()` to create incognito contexts must still wrap remote browsers.

## Key Files

| File | Change |
|------|--------|
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Constructor stores options, `_launch()` branches for remote |

## Acceptance Criteria

- [x] `PuppeteerPlugin` accepts `connectOverCDPOptions` in constructor and calls `puppeteer.connect()` with the options object
- [x] The `newPage()` Proxy wrapper is applied to remote browsers (for incognito support)
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips user data directory setup, headless handling, and other local-only logic
- [x] TypeScript compiles with no errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nection logging

## Goal

Make `BrowserPlugin.launch()` skip proxy injection and webdriver hiding when `launchContext.isRemote` is `true`, since these operations modify `launchOptions` which are not used for remote connections.

## Dependencies

- Task 1 (`isRemote` flag on LaunchContext)

## Scope

### 1. Skip `_addProxyToLaunchOptions()` for remote

**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`

In the `launch()` method, the call to `_addProxyToLaunchOptions()` is now gated on `!isRemote`:

```typescript
if (launchContext.proxyUrl && !launchContext.isRemote) {
    await this._addProxyToLaunchOptions(launchContext);
}
```

### 2. Skip `_mergeArgsToHideWebdriver()` for remote

```typescript
if (!launchContext.isRemote && this._isChromiumBasedBrowser(launchContext)) {
    this._mergeArgsToHideWebdriver(launchContext);
}
```

### 3. No changes to `_addProxyToLaunchOptions()` or `_mergeArgsToHideWebdriver()` themselves

The methods remain unchanged — the skip logic lives in the calling `launch()` method.

## Key Design Decisions

- **Skip at call site, not in the methods**
- **`proxyUrl` + remote triggers a warning:** Handled in Task 6 (Warnings)
- **Fingerprinting hooks are unchanged**

## Additional

- Fixed `isRemote` not being passed through base class `createLaunchContext()`
- Added info-level logs for remote connections in base class and both plugins

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ht overloads

Playwright: change PlaywrightConnectOverCDPOptions and PlaywrightConnectOptions
from type aliases (all-optional fields) to interfaces with required `wsEndpoint`.
Use the non-deprecated two-argument overloads in _launch().

Puppeteer: add runtime guard that throws if neither `browserWSEndpoint` nor
`browserURL` is provided in connectOverCDPOptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

# Task 5: `useIncognitoPages` Defaults to `true` for Remote

## Goal

When remote connection options are present and `useIncognitoPages` was not explicitly set by the user, default it to `true` and log an info message. If the user explicitly sets `false`, log a warning.

## Dependencies

- Task 2 (PlaywrightPlugin stores connect options)
- Task 3 (PuppeteerPlugin stores connect options)

## Scope

### 1. Preserve `undefined` vs `false` in base constructor

The base `BrowserPlugin` constructor currently collapses `useIncognitoPages` to `false`. The subclass checks `options.useIncognitoPages` directly (preserves `undefined`) and overrides after `super()`.

### 2. Override default in PlaywrightPlugin constructor

After the `super()` call, if connect options are present:

- `undefined` → set to `true`, info log
- `false` → warning log
- `true` → no extra log

### 3. Override default in PuppeteerPlugin constructor

Same logic, checking `connectOverCDPOptions`.

## Key Design Decisions

- **Info vs warning:** Defaulting to `true` is an info message (expected behavior). Explicit `false` is a warning (user should understand implications).
- **`useIncognitoPages: false` + `connect()` is not special-cased:** The warning covers this case — no additional error or fallback.
- **Uses existing `this.log`:** All logging uses the inherited `BrowserPlugin.log` logger.

## Acceptance Criteria

- [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages` is not provided → defaults to `true`, info message logged
- [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: false` → stays `false`, warning logged
- [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: true` → stays `true`, no extra log
- [x] When no connect options are set → existing behavior unchanged
- [x] Base constructor preserves `undefined` vs `false` distinction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename PlaywrightConnectOverCDPOptions.wsEndpoint → endpointURL to match
  Playwright's own terminology and avoid field conflict with inherited
  ConnectOverCDPOptions.endpointURL
- Wrap connectOverCDP() and connect() failures with BrowserLaunchError
  including sanitized endpoint URL (credentials stripped) and actionable
  guidance
- Move endpoint validation to constructors (fail fast) — Playwright validates
  endpointURL and wsEndpoint are non-empty, Puppeteer validates
  browserWSEndpoint || browserURL
- Add _sanitizeEndpointForLog() to both plugins to strip credentials from
  URLs before including them in error messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions

- Close BrowserContext on page close when useIncognitoPages is true.
  Previously contexts were only cleaned up when an anonymized proxy was
  active, causing context accumulation on remote browsers without proxy.
- Clean up targetcreated listener on remote browser disconnect via
  browser.once('disconnected') handler to prevent listener leaks.
- Guard anonymizeProxySugar call with proxyUrl check — skip the async
  call entirely when no proxy is configured (common for remote browsers).
- Conditionally omit proxyServer from context options when no proxy is
  set, instead of passing { proxyServer: undefined }.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ket connections

- Add comments in both plugin constructors explaining why
  options.useIncognitoPages is checked instead of this.useIncognitoPages
  (super() collapses undefined to false, losing the "not set" signal).
- Strengthen warning for Playwright connectOptions (WebSocket) +
  useIncognitoPages: false — connect() returns a browser with no default
  context, which is more severe than just sharing cookies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove spurious launchOptions warning that always fired due to
framework-injected defaults, and share log instances in launchers.

PRD Task 6: Warnings for Ignored & Conflicting Options
- proxyUrl + remote → warning in base BrowserPlugin.launch()
- useChrome + remote → warning in launcher constructors
- executablePath + remote → warning in launcher constructors
- useIncognitoPages: false + remote → handled by Task 5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PRD Task 7: Unit Tests
- Connection routing (Playwright CDP/WS/local, Puppeteer CDP/local)
- Validation (mutual exclusion, missing endpoints)
- isRemote correctness for all plugin variants
- Proxy/webdriver skipping for remote, applied for local
- useIncognitoPages defaults (true for remote, false for local)
- Warnings (proxyUrl, useIncognitoPages: false, CDP vs WS variants)
- 40 tests, all mocked (no real browser instances)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…texts

When useIncognitoPages is true (default for remote) and proxyUrl is set,
the newPage handler was passing proxyServer to createBrowserContext even
for remote connections. For credentialed proxies this also spun up a
localhost tunnel unreachable by the remote browser.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Examples for Browserbase, Browserless, Rebrowser, and Steel using Playwright and Puppeteer.
@l2ysho l2ysho changed the title 1822 connect to remote browser services feat: connect to remote browser services Apr 21, 2026
l2ysho added 3 commits April 27, 2026 11:33
…oteBrowser config

Add a unified API for connecting crawlers to remote browser services
  (Browserbase, Browserless, Steel, Rebrowser). Users can either pass a
  RemoteBrowserConfig object or extend RemoteBrowserProvider with typed
  connect()/release() lifecycle methods.

  - Add RemoteBrowserProvider abstract class with generic TContext
  - Add RemoteBrowserConfig interface (endpoint + release + type)
  - Wire remoteBrowser through BrowserPlugin, PlaywrightPlugin, PuppeteerPlugin
  - Auto-call release() on browser close/crash/pool destroy
  - Skip fingerprinting, proxy injection, and webdriver stealth for remote browsers
  - Skip session-based browser retirement for remote browsers (isRemote guard)
  - Default useIncognitoPages to true for remote connections
  - Add 30+ unit tests for both config and provider patterns
  - Update all temp-examples to use RemoteBrowserProvider
… overflow

Remote browser services enforce concurrent session limits. During browser
  retirement transitions, the pool could briefly exceed the limit by launching
  a new browser before the retired one fully closed.

  - Add maxOpenBrowsers to RemoteBrowserConfig and RemoteBrowserProvider
  - BrowserCrawler reads it from the plugin and applies it to the pool
  - Gate new tasks via _isTaskReadyFunction (same pattern as maxConcurrency)
  - Add hasFreeBrowserSlot() and hasActiveBrowserWithFreeCapacity() to BrowserPool
  - Only activates when maxOpenBrowsers is set (remote browsers); local browsers unaffected
Comment thread packages/browser-crawler/src/internals/browser-crawler.ts Outdated
Comment thread packages/browser-pool/src/remote-browser-provider.ts
*
* @param _context The same `context` object returned by {@link connect}.
*/
async release(_context: TContext): Promise<void> {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware that this is not the best name for it

l2ysho added 4 commits April 30, 2026 15:43
… cookie sharing

Remote CDP browsers (both Puppeteer and Playwright) now default useIncognitoPages
  to false, matching local behavior. For Playwright CDP, the browser's default context
  is wrapped in PlaywrightBrowserWithPersistentContext so pages share cookies — the same
  mechanism used locally with launchPersistentContext().

  Playwright WebSocket still defaults to true since connect() returns a browser with
  no default context to wrap.

  The wrapper passes the real Browser as parentBrowser so close() also closes the
  WebSocket transport and disconnected events are forwarded to the pool.
Comment thread packages/browser-pool/src/playwright/playwright-plugin.ts Outdated
Comment thread packages/browser-pool/src/playwright/playwright-plugin.ts Outdated
l2ysho and others added 22 commits May 27, 2026 13:50
Adds a vitest-based integration suite under test/integration that
exercises Crawlee end-to-end against a real Browserless instance.
The first test verifies the force-incognito behavior for remote
Playwright CDP connections: two requests landing on the same browser
do not share cookies even when retireBrowserAfterPageCount is high
and saveResponseCookies is disabled.

Gated on CRAWLEE_DIFFICULT_TESTS so `pnpm test` skips the suite by
default — `pnpm test:integration` and `pnpm test:full` set the flag.
The suite expects Browserless and httpbin running on a shared Docker
network; `pnpm test:integration:services:up` spins them up locally,
and a new GitHub Actions workflow provides them as service containers.

Also sets core-js-pure: false in pnpm-workspace.yaml allowBuilds to
match prior skip-by-default behavior under pnpm 11.
Replace the warn-and-silently-drop path with a constructor throw in both
PlaywrightPlugin and PuppeteerPlugin when more than one of remoteBrowser,
connectOptions, or connectOverCDPOptions is set. Fixes the doc/impl
mismatch where JSDoc claimed remoteBrowser "Takes precedence" but the
implementation actually dropped it.
PlaywrightLauncher and PuppeteerLauncher now throw if launchOptions is
set alongside connectOptions, connectOverCDPOptions, or remoteBrowser.
The launcher is the right layer for this check — at the plugin level the
launcher always injects defaults (executablePath) into launchOptions, so
the plugin cannot distinguish user-set from framework-default. Removes
the now-unreachable executablePath warning and consolidates the useChrome
warning behind the unified hasRemote flag.
CDP is also a WebSocket protocol, so 'websocket' was a misleading label.
Rename to 'playwright', which names the actual transport (Playwright's
client-server protocol exposed via browserType.connect()). Updated:
RemoteBrowserConfig.type, PlaywrightRemoteBrowserConfig.type,
RemoteBrowserProvider.type, the playwright-plugin branch, the puppeteer
"not supported" error message, the connect log line, and all tests.
The RemoteBrowserConfig / RemoteBrowserProvider abstraction is built for
remote browser services (Browserless, Browserbase, Steel), which all
speak CDP. The 'websocket'/'playwright' branch (browserType.connect())
had no real provider behind it, and naming it 'websocket' was misleading
(CDP also rides WebSocket). Rather than commit to a name that BiDi will
make obsolete anyway, drop the field entirely. Callers who genuinely
need Playwright's connect() can still use connectOptions directly.

Removes:
- RemoteBrowserConfig.type and RemoteBrowserProvider.type
- PlaywrightRemoteBrowserConfig and PuppeteerRemoteBrowserConfig
  (now-empty interface extensions)
- The 'playwright' branch in PlaywrightPlugin._launch
- The "Puppeteer does not support 'playwright'" throw + tests
- 5 type-related test cases
The crawler-level 'headless' shortcut synthesized a launchContext.launchOptions
object, which then tripped the launcher's mutual-exclusion check against
remoteBrowser. Warn and skip the mutation instead — remote services control
headless mode anyway. Mirrors the existing useChrome warning in the launcher.
Replace apify/workflows/pnpm-install@main with a direct pnpm install
call without --loglevel error and without the pnpm store cache, to
surface the actual error behind the 8-min silent hang on Node 24.

Revert once root cause is identified.
…browser-services

# Conflicts:
#	packages/browser-crawler/src/internals/browser-crawler.ts
#	pnpm-workspace.yaml
A static-string endpoint can't receive proxyUrl, but the old log claimed
it would be passed to endpoint() — masking that proxy traffic was being
silently ignored. Branch on typeof endpoint and warn in the string case.
…bers

The remote_browser_provider.ts guide overrides maxOpenBrowsers and
release() from the base class without the override modifier, failing
the docs typecheck (TS4114).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These two tests configured a static-string endpoint but asserted the
info log that only fires for a function endpoint. Since the string-endpoint
path now warns instead (commit ef362cd), the assertions never matched.
Use a function endpoint so the forwarding info log fires, matching the
test's stated intent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce RemoteBrowserPool, an IBrowserPool implementation for remote
browser services that wraps a BrowserPool (composition). It owns the one
thing the plain pool cannot enforce on its own: a maxOpenBrowsers limit on
concurrent remote browsers. newPage() waits for a free slot (event-driven,
with a poll fallback) instead of letting the crawler overshoot the remote
service's session quota.

Pass an instance via the crawler's browserPool option (added in #3669) to
plug remote support in as a first-class pool rather than threading it
through launchContext. A pool supplied this way is not owned by the
crawler, so its lifecycle (and reuse across crawlers) is the caller's.

The remote-session lifecycle (connect/release) remains owned by the
wrapped pool's plugin and its remoteBrowser config — this class only
governs when new pages may open.

Adds a docs guide section + typechecked example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make RemoteBrowserPool the single owner of all remote-session concerns
instead of weaving remote logic through the base plugin, controller,
launch context, launchers and crawlers.

RemoteBrowserPool (implements IBrowserPool) now owns:
- endpoint resolution (static URL, per-launch function, or
  RemoteBrowserProvider) via an internal RemoteSessionRegistry;
- the release lifecycle, with a release-at-most-once guarantee and a
  releaseAll() backstop on destroy() so no remote session leaks;
- the maxOpenBrowsers throttle, enforced inside newPage().

Plugins keep only the library-specific connect() call, driven by a thin
RemoteConnection bridge the pool injects (useRemoteConnection). The
controller releases sessions by token via that bridge.

This removes the two footguns from the previous design by construction:
double-release (idempotent registry + token clear) and the teardown
session leak (releaseAll backstop).

BREAKING CHANGE: remote browsers are configured exclusively through
RemoteBrowserPool passed as the crawler's browserPool option. The
launchContext.remoteBrowser / connectOptions / connectOverCDPOptions
options are removed; Playwright CDP-vs-WebSocket selection moves to the
pool's connection.protocol option.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(browser-crawler): add remoteBrowser option so the crawler builds the pool

Building a RemoteBrowserPool by hand and passing it as browserPool meant
constructing the browser plugin twice (the crawler builds and discards
its own) and allowed mismatching the pool with the crawler — e.g. a
Puppeteer-backed pool in a PlaywrightCrawler type-checked but broke at
runtime.

Add a remoteBrowser option (CrawlerRemoteBrowserOptions: endpoint /
release / maxOpenBrowsers / connection — no plugin) to the browser
crawlers. The crawler builds a RemoteBrowserPool around its OWN plugin,
so the connection is always for the matching browser; there is nothing
to construct and no way to mismatch. The crawler owns and tears down
this pool.

browserPool stays for sharing one remote pool across crawlers (not owned
by the crawler); the two options are mutually exclusive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

simplify
Two BrowserLaunchError strings ended with an invisible U+200B
zero-width space, which shipped in user-facing error messages and
broke log grepping / string matching.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sanitizeEndpointForLog only masked userinfo credentials, so tokens
passed in the query string (e.g. Browserless '?token=...') still
leaked into connect-failure error messages. Drop the query and
fragment entirely, keeping protocol/host/port/path for diagnostics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@l2ysho l2ysho marked this pull request as ready for review June 17, 2026 10:49
BROWSERLESS_URL: http://localhost:3000
HTTPBIN_URL: http://httpbin
CRAWLEE_DIFFICULT_TESTS: 1
RETRY_TESTS: 1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B4nan if you have a time pls check how integration tests are wired up.

@l2ysho l2ysho self-assigned this Jun 17, 2026
@l2ysho l2ysho added the t-tooling Issues with this label are in the ownership of the tooling team. label Jun 17, 2026

@barjin barjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @l2ysho !

I don't see any major issues, so approving 👍 Once in v4, we can test this more and, e.g., fine-tune the interfaces if needed.

Cheers!

Comment thread pnpm-workspace.yaml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the changes in this file seem to be just indent-level formatting. perhaps we should revert this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Did a little bit digging why this happened and it seems pnpm did this (probably due to some changes I added and removed later in PR) and it kinda ignore editorconfig. We can revert but next time it will do it again.

@l2ysho l2ysho requested a review from janbuchar June 22, 2026 12:05

@janbuchar janbuchar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what I see! I just have a couple of design-related questions.

--health-interval 5s
--health-timeout 5s
--health-retries 12
httpbin:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work reliably for you? Back in the day I set up a Standby Actor that serves httpbin to replace using the Docker container directly, but I don't remember the exact problem this was solving. This was in crawlee-python, where it later got replaced by a bespoke http testing server. But I think maybe impit still uses the Actor? @barjin?

Comment on lines +440 to +445
if (browserPool && remoteBrowser) {
throw new Error(
"Set at most one of 'browserPool' and 'remoteBrowser'. To share a remote pool across crawlers, " +
'build a RemoteBrowserPool yourself and pass it as `browserPool`.',
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it absolutely neccessary to allow passing both? It would be much cleaner to just accept a single option, browserPool: IBrowserPool

* {@apilink RemoteConnection}. Safe to call multiple times — the token is cleared after the first call
* and the pool's registry also dedupes, so `release()` fires at most once across close()/kill().
*/
private async _releaseRemoteBrowser(): Promise<void> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no business being in an abstract browser controller, the remote control would fit better into RemoteBrowserPool or something like that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the changes in this file would fit better into a specific remote browser plugin class, not an abstract base class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants