feat: connect to remote browser services#3545
Conversation
… support
# Task 1: Type Definitions & LaunchContext `isRemote` Flag
## Goal
Add the foundational types and the `isRemote` flag that all other remote browser tasks depend on.
## Dependencies
None — this is the foundation task.
## Scope
### 1. Add `isRemote` to `LaunchContext`
**File:** `packages/browser-pool/src/launch-context.ts`
- Add `isRemote?: boolean` to the `LaunchContextOptions` interface (alongside `id`, `browserPlugin`, etc.)
- Add a public readonly `isRemote: boolean` property to the `LaunchContext` class
- Set it from constructor options, defaulting to `false`
### 2. Define connect option types on PlaywrightPlugin
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
Add the following type to the plugin file (or a co-located types file):
```typescript
// Mirrors browserType.connectOverCDP(endpointURL, options)
interface PlaywrightConnectOverCDPOptions {
endpointURL: string;
options?: Parameters<BrowserType['connectOverCDP']>[1];
}
// Mirrors browserType.connect(wsEndpoint, options)
interface PlaywrightConnectOptions {
wsEndpoint: string;
options?: Parameters<BrowserType['connect']>[1];
}
```
Use the existing `Parameters` utility type pattern (see how `SafeParameters` is used elsewhere in the codebase) — do NOT redefine Playwright's types manually.
### 3. Define connect option types on PuppeteerPlugin
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
```typescript
// Mirrors puppeteer.connect({ browserWSEndpoint, ...rest })
// Flat object matching Puppeteer's ConnectOptions
type PuppeteerConnectOverCDPOptions = Parameters<typeof puppeteer.connect>[0];
```
Use the `Parameters` pattern to extract the type from Puppeteer's `connect` method.
### 4. Add connect option fields to `BrowserPluginOptions`
**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`
This is a design choice — the PRD says connect options live on the plugin subclass, not on `LaunchContext`. Add the fields to the plugin options type so they flow through the constructor:
- `PlaywrightPlugin` options should accept `connectOptions?` and `connectOverCDPOptions?`
- `PuppeteerPlugin` options should accept `connectOverCDPOptions?`
These can be added to subclass-specific option types rather than the base `BrowserPluginOptions`.
### 5. Add connect option fields to launcher-level interfaces
**File:** `packages/playwright-crawler/src/internals/playwright-launcher.ts`
Add to `PlaywrightLaunchContext`:
```typescript
connectOptions?: PlaywrightConnectOptions;
connectOverCDPOptions?: PlaywrightConnectOverCDPOptions;
```
**File:** `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts`
Add to `PuppeteerLaunchContext`:
```typescript
connectOverCDPOptions?: PuppeteerConnectOverCDPOptions;
```
This enables IDE autocomplete when users configure `launchContext` on the crawler.
### 6. Export new types
**File:** `packages/browser-pool/src/index.ts`
Export the new connect option types so they're available to consumers.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/launch-context.ts` | Add `isRemote` option + property |
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Add connect option types |
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Add connect option type |
| `packages/playwright-crawler/src/internals/playwright-launcher.ts` | Add connect options to `PlaywrightLaunchContext` |
| `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts` | Add connect options to `PuppeteerLaunchContext` |
| `packages/browser-pool/src/index.ts` | Export new types |
| `packages/browser-crawler/src/internals/browser-launcher.ts` | May need connect options on `BrowserLaunchContext` base |
## Acceptance Criteria
- [x] `LaunchContext` has `isRemote` boolean property, defaults to `false`
- [x] Connect option types are defined using library `Parameters` extraction (not manual redefinition)
- [x] `PlaywrightLaunchContext` shows `connectOptions` and `connectOverCDPOptions` in IDE autocomplete
- [x] `PuppeteerLaunchContext` shows `connectOverCDPOptions` in IDE autocomplete
- [x] New types are exported from `@crawlee/browser-pool`
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and `connectOverCDP()`
# Task 2: PlaywrightPlugin Remote Connection Routing
## Goal
Make `PlaywrightPlugin._launch()` branch to `connect()` or `connectOverCDP()` when remote connection options are present, instead of calling `launch()`.
## Dependencies
- Task 1 (types and `isRemote` flag)
## Scope
### 1. Store connect options on the plugin instance
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
- Accept `connectOptions` and `connectOverCDPOptions` in the constructor options
- Store them as instance properties
- **Validation:** If both `connectOptions` AND `connectOverCDPOptions` are provided, throw an error immediately in the constructor:
```
Cannot set both 'connectOptions' and 'connectOverCDPOptions' — pick one protocol.
```
### 2. Branch in `_launch()` for remote connections
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
In the existing `_launch()` method (currently lines 22-102), add branching logic **before** the existing local launch code:
```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
// Remote CDP connection
if (this.connectOverCDPOptions) {
const { endpointURL, options } = this.connectOverCDPOptions;
const browser = await browserType.connectOverCDP(endpointURL, options);
return browser;
}
// Remote Playwright WebSocket connection
if (this.connectOptions) {
const { wsEndpoint, options } = this.connectOptions;
const browser = await browserType.connect(wsEndpoint, options);
return browser;
}
// Existing local launch logic...
}
```
**Reference:** See `StagehandPlugin._launch()` at `packages/stagehand-crawler/src/internals/stagehand-plugin.ts:102-107` for the CDP connection pattern:
```typescript
const cdpUrl = await stagehand.connectURL();
const browser = await chromium.connectOverCDP(cdpUrl);
```
### 3. Set `isRemote` on LaunchContext
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
In `createLaunchContext()` (or wherever the plugin creates the LaunchContext), pass `isRemote: true` when connect options are present. This can be done by overriding `createLaunchContext()` in the subclass, or by passing it through the options.
Check how the base `BrowserPlugin.createLaunchContext()` works (at `packages/browser-pool/src/abstract-classes/browser-plugin.ts:149-174`) and determine the best insertion point.
## Key Design Decisions
- **No new abstract method:** The routing happens inside `_launch()` via internal branching, not a new `_connect()` method. This keeps the abstract interface unchanged and doesn't affect custom plugins like StagehandPlugin.
- **`browser.close()` for cleanup:** Remote browsers are closed the same way as local browsers — via `browser.close()`. No special disconnect handling.
- **No proxy server setup for remote:** The remote branch skips the local proxy server setup that exists in the current `_launch()` code.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Constructor stores options, `_launch()` branches for remote |
## Acceptance Criteria
- [x] `PlaywrightPlugin` accepts `connectOptions` in constructor and calls `browserType.connect()` with `wsEndpoint` and `options`
- [x] `PlaywrightPlugin` accepts `connectOverCDPOptions` in constructor and calls `browserType.connectOverCDP()` with `endpointURL` and `options`
- [x] Setting both `connectOptions` and `connectOverCDPOptions` throws an error
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips local proxy server setup and persistent context logic
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nnect()`
# Task 3: PuppeteerPlugin Remote Connection Routing
## Goal
Make `PuppeteerPlugin._launch()` branch to `puppeteer.connect()` when remote connection options (CDP) are present, instead of calling `puppeteer.launch()`.
## Dependencies
- Task 1 (types and `isRemote` flag)
## Scope
### 1. Store connect options on the plugin instance
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
- Accept `connectOverCDPOptions` in the constructor options
- Store as an instance property
- Puppeteer only supports CDP — there is no `connectOptions` field (Playwright-only)
### 2. Branch in `_launch()` for remote connections
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
In the existing `_launch()` method (currently lines 22-203), add branching logic **before** the existing local launch code:
```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
// Remote CDP connection
if (this.connectOverCDPOptions) {
const browser = await puppeteer.connect(this.connectOverCDPOptions);
// Wrap with the same Proxy handler for newPage() interception
// (see existing code at lines 138-200)
return wrappedBrowser;
}
// Existing local launch logic...
}
```
**Important:** Puppeteer's `connect()` takes a flat options object: `puppeteer.connect({ browserWSEndpoint, ...rest })`. This is different from Playwright's two-argument pattern. The type should match Puppeteer's `ConnectOptions`.
### 3. Handle the `newPage()` Proxy wrapper for remote
The existing `_launch()` wraps the browser in a `Proxy` that intercepts `newPage()` calls to support `useIncognitoPages` (lines 138-200). This proxy wrapper should also be applied to remote browsers so that incognito context creation works correctly.
### 4. Set `isRemote` on LaunchContext
Same pattern as Task 2 — pass `isRemote: true` when `connectOverCDPOptions` is present.
## Key Design Decisions
- **Flat options object:** Puppeteer's `connect()` API takes a single options object (not `endpointURL, options` like Playwright). The `connectOverCDPOptions` type matches this flat shape directly.
- **`browser.close()` for cleanup:** Same as Playwright — remote browsers closed via `browser.close()`, not `browser.disconnect()`.
- **`newPage()` proxy still needed:** The Proxy wrapper that intercepts `newPage()` to create incognito contexts must still wrap remote browsers.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Constructor stores options, `_launch()` branches for remote |
## Acceptance Criteria
- [x] `PuppeteerPlugin` accepts `connectOverCDPOptions` in constructor and calls `puppeteer.connect()` with the options object
- [x] The `newPage()` Proxy wrapper is applied to remote browsers (for incognito support)
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips user data directory setup, headless handling, and other local-only logic
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nection logging
## Goal
Make `BrowserPlugin.launch()` skip proxy injection and webdriver hiding when `launchContext.isRemote` is `true`, since these operations modify `launchOptions` which are not used for remote connections.
## Dependencies
- Task 1 (`isRemote` flag on LaunchContext)
## Scope
### 1. Skip `_addProxyToLaunchOptions()` for remote
**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`
In the `launch()` method, the call to `_addProxyToLaunchOptions()` is now gated on `!isRemote`:
```typescript
if (launchContext.proxyUrl && !launchContext.isRemote) {
await this._addProxyToLaunchOptions(launchContext);
}
```
### 2. Skip `_mergeArgsToHideWebdriver()` for remote
```typescript
if (!launchContext.isRemote && this._isChromiumBasedBrowser(launchContext)) {
this._mergeArgsToHideWebdriver(launchContext);
}
```
### 3. No changes to `_addProxyToLaunchOptions()` or `_mergeArgsToHideWebdriver()` themselves
The methods remain unchanged — the skip logic lives in the calling `launch()` method.
## Key Design Decisions
- **Skip at call site, not in the methods**
- **`proxyUrl` + remote triggers a warning:** Handled in Task 6 (Warnings)
- **Fingerprinting hooks are unchanged**
## Additional
- Fixed `isRemote` not being passed through base class `createLaunchContext()`
- Added info-level logs for remote connections in base class and both plugins
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ht overloads Playwright: change PlaywrightConnectOverCDPOptions and PlaywrightConnectOptions from type aliases (all-optional fields) to interfaces with required `wsEndpoint`. Use the non-deprecated two-argument overloads in _launch(). Puppeteer: add runtime guard that throws if neither `browserWSEndpoint` nor `browserURL` is provided in connectOverCDPOptions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions # Task 5: `useIncognitoPages` Defaults to `true` for Remote ## Goal When remote connection options are present and `useIncognitoPages` was not explicitly set by the user, default it to `true` and log an info message. If the user explicitly sets `false`, log a warning. ## Dependencies - Task 2 (PlaywrightPlugin stores connect options) - Task 3 (PuppeteerPlugin stores connect options) ## Scope ### 1. Preserve `undefined` vs `false` in base constructor The base `BrowserPlugin` constructor currently collapses `useIncognitoPages` to `false`. The subclass checks `options.useIncognitoPages` directly (preserves `undefined`) and overrides after `super()`. ### 2. Override default in PlaywrightPlugin constructor After the `super()` call, if connect options are present: - `undefined` → set to `true`, info log - `false` → warning log - `true` → no extra log ### 3. Override default in PuppeteerPlugin constructor Same logic, checking `connectOverCDPOptions`. ## Key Design Decisions - **Info vs warning:** Defaulting to `true` is an info message (expected behavior). Explicit `false` is a warning (user should understand implications). - **`useIncognitoPages: false` + `connect()` is not special-cased:** The warning covers this case — no additional error or fallback. - **Uses existing `this.log`:** All logging uses the inherited `BrowserPlugin.log` logger. ## Acceptance Criteria - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages` is not provided → defaults to `true`, info message logged - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: false` → stays `false`, warning logged - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: true` → stays `true`, no extra log - [x] When no connect options are set → existing behavior unchanged - [x] Base constructor preserves `undefined` vs `false` distinction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename PlaywrightConnectOverCDPOptions.wsEndpoint → endpointURL to match Playwright's own terminology and avoid field conflict with inherited ConnectOverCDPOptions.endpointURL - Wrap connectOverCDP() and connect() failures with BrowserLaunchError including sanitized endpoint URL (credentials stripped) and actionable guidance - Move endpoint validation to constructors (fail fast) — Playwright validates endpointURL and wsEndpoint are non-empty, Puppeteer validates browserWSEndpoint || browserURL - Add _sanitizeEndpointForLog() to both plugins to strip credentials from URLs before including them in error messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions
- Close BrowserContext on page close when useIncognitoPages is true.
Previously contexts were only cleaned up when an anonymized proxy was
active, causing context accumulation on remote browsers without proxy.
- Clean up targetcreated listener on remote browser disconnect via
browser.once('disconnected') handler to prevent listener leaks.
- Guard anonymizeProxySugar call with proxyUrl check — skip the async
call entirely when no proxy is configured (common for remote browsers).
- Conditionally omit proxyServer from context options when no proxy is
set, instead of passing { proxyServer: undefined }.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ket connections - Add comments in both plugin constructors explaining why options.useIncognitoPages is checked instead of this.useIncognitoPages (super() collapses undefined to false, losing the "not set" signal). - Strengthen warning for Playwright connectOptions (WebSocket) + useIncognitoPages: false — connect() returns a browser with no default context, which is more severe than just sharing cookies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove spurious launchOptions warning that always fired due to framework-injected defaults, and share log instances in launchers. PRD Task 6: Warnings for Ignored & Conflicting Options - proxyUrl + remote → warning in base BrowserPlugin.launch() - useChrome + remote → warning in launcher constructors - executablePath + remote → warning in launcher constructors - useIncognitoPages: false + remote → handled by Task 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PRD Task 7: Unit Tests - Connection routing (Playwright CDP/WS/local, Puppeteer CDP/local) - Validation (mutual exclusion, missing endpoints) - isRemote correctness for all plugin variants - Proxy/webdriver skipping for remote, applied for local - useIncognitoPages defaults (true for remote, false for local) - Warnings (proxyUrl, useIncognitoPages: false, CDP vs WS variants) - 40 tests, all mocked (no real browser instances) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…texts When useIncognitoPages is true (default for remote) and proxyUrl is set, the newPage handler was passing proxyServer to createBrowserContext even for remote connections. For credentialed proxies this also spun up a localhost tunnel unreachable by the remote browser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Examples for Browserbase, Browserless, Rebrowser, and Steel using Playwright and Puppeteer.
…oteBrowser config Add a unified API for connecting crawlers to remote browser services (Browserbase, Browserless, Steel, Rebrowser). Users can either pass a RemoteBrowserConfig object or extend RemoteBrowserProvider with typed connect()/release() lifecycle methods. - Add RemoteBrowserProvider abstract class with generic TContext - Add RemoteBrowserConfig interface (endpoint + release + type) - Wire remoteBrowser through BrowserPlugin, PlaywrightPlugin, PuppeteerPlugin - Auto-call release() on browser close/crash/pool destroy - Skip fingerprinting, proxy injection, and webdriver stealth for remote browsers - Skip session-based browser retirement for remote browsers (isRemote guard) - Default useIncognitoPages to true for remote connections - Add 30+ unit tests for both config and provider patterns - Update all temp-examples to use RemoteBrowserProvider
… overflow Remote browser services enforce concurrent session limits. During browser retirement transitions, the pool could briefly exceed the limit by launching a new browser before the retired one fully closed. - Add maxOpenBrowsers to RemoteBrowserConfig and RemoteBrowserProvider - BrowserCrawler reads it from the plugin and applies it to the pool - Gate new tasks via _isTaskReadyFunction (same pattern as maxConcurrency) - Add hasFreeBrowserSlot() and hasActiveBrowserWithFreeCapacity() to BrowserPool - Only activates when maxOpenBrowsers is set (remote browsers); local browsers unaffected
| * | ||
| * @param _context The same `context` object returned by {@link connect}. | ||
| */ | ||
| async release(_context: TContext): Promise<void> {} |
There was a problem hiding this comment.
I am aware that this is not the best name for it
… cookie sharing Remote CDP browsers (both Puppeteer and Playwright) now default useIncognitoPages to false, matching local behavior. For Playwright CDP, the browser's default context is wrapped in PlaywrightBrowserWithPersistentContext so pages share cookies — the same mechanism used locally with launchPersistentContext(). Playwright WebSocket still defaults to true since connect() returns a browser with no default context to wrap. The wrapper passes the real Browser as parentBrowser so close() also closes the WebSocket transport and disconnected events are forwarded to the pool.
Adds a vitest-based integration suite under test/integration that exercises Crawlee end-to-end against a real Browserless instance. The first test verifies the force-incognito behavior for remote Playwright CDP connections: two requests landing on the same browser do not share cookies even when retireBrowserAfterPageCount is high and saveResponseCookies is disabled. Gated on CRAWLEE_DIFFICULT_TESTS so `pnpm test` skips the suite by default — `pnpm test:integration` and `pnpm test:full` set the flag. The suite expects Browserless and httpbin running on a shared Docker network; `pnpm test:integration:services:up` spins them up locally, and a new GitHub Actions workflow provides them as service containers. Also sets core-js-pure: false in pnpm-workspace.yaml allowBuilds to match prior skip-by-default behavior under pnpm 11.
Replace the warn-and-silently-drop path with a constructor throw in both PlaywrightPlugin and PuppeteerPlugin when more than one of remoteBrowser, connectOptions, or connectOverCDPOptions is set. Fixes the doc/impl mismatch where JSDoc claimed remoteBrowser "Takes precedence" but the implementation actually dropped it.
PlaywrightLauncher and PuppeteerLauncher now throw if launchOptions is set alongside connectOptions, connectOverCDPOptions, or remoteBrowser. The launcher is the right layer for this check — at the plugin level the launcher always injects defaults (executablePath) into launchOptions, so the plugin cannot distinguish user-set from framework-default. Removes the now-unreachable executablePath warning and consolidates the useChrome warning behind the unified hasRemote flag.
CDP is also a WebSocket protocol, so 'websocket' was a misleading label. Rename to 'playwright', which names the actual transport (Playwright's client-server protocol exposed via browserType.connect()). Updated: RemoteBrowserConfig.type, PlaywrightRemoteBrowserConfig.type, RemoteBrowserProvider.type, the playwright-plugin branch, the puppeteer "not supported" error message, the connect log line, and all tests.
The RemoteBrowserConfig / RemoteBrowserProvider abstraction is built for remote browser services (Browserless, Browserbase, Steel), which all speak CDP. The 'websocket'/'playwright' branch (browserType.connect()) had no real provider behind it, and naming it 'websocket' was misleading (CDP also rides WebSocket). Rather than commit to a name that BiDi will make obsolete anyway, drop the field entirely. Callers who genuinely need Playwright's connect() can still use connectOptions directly. Removes: - RemoteBrowserConfig.type and RemoteBrowserProvider.type - PlaywrightRemoteBrowserConfig and PuppeteerRemoteBrowserConfig (now-empty interface extensions) - The 'playwright' branch in PlaywrightPlugin._launch - The "Puppeteer does not support 'playwright'" throw + tests - 5 type-related test cases
The crawler-level 'headless' shortcut synthesized a launchContext.launchOptions object, which then tripped the launcher's mutual-exclusion check against remoteBrowser. Warn and skip the mutation instead — remote services control headless mode anyway. Mirrors the existing useChrome warning in the launcher.
Replace apify/workflows/pnpm-install@main with a direct pnpm install call without --loglevel error and without the pnpm store cache, to surface the actual error behind the 8-min silent hang on Node 24. Revert once root cause is identified.
This reverts commit ca529b6.
…browser-services # Conflicts: # packages/browser-crawler/src/internals/browser-crawler.ts # pnpm-workspace.yaml
A static-string endpoint can't receive proxyUrl, but the old log claimed it would be passed to endpoint() — masking that proxy traffic was being silently ignored. Branch on typeof endpoint and warn in the string case.
…bers The remote_browser_provider.ts guide overrides maxOpenBrowsers and release() from the base class without the override modifier, failing the docs typecheck (TS4114). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These two tests configured a static-string endpoint but asserted the info log that only fires for a function endpoint. Since the string-endpoint path now warns instead (commit ef362cd), the assertions never matched. Use a function endpoint so the forwarding info log fires, matching the test's stated intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce RemoteBrowserPool, an IBrowserPool implementation for remote browser services that wraps a BrowserPool (composition). It owns the one thing the plain pool cannot enforce on its own: a maxOpenBrowsers limit on concurrent remote browsers. newPage() waits for a free slot (event-driven, with a poll fallback) instead of letting the crawler overshoot the remote service's session quota. Pass an instance via the crawler's browserPool option (added in #3669) to plug remote support in as a first-class pool rather than threading it through launchContext. A pool supplied this way is not owned by the crawler, so its lifecycle (and reuse across crawlers) is the caller's. The remote-session lifecycle (connect/release) remains owned by the wrapped pool's plugin and its remoteBrowser config — this class only governs when new pages may open. Adds a docs guide section + typechecked example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make RemoteBrowserPool the single owner of all remote-session concerns instead of weaving remote logic through the base plugin, controller, launch context, launchers and crawlers. RemoteBrowserPool (implements IBrowserPool) now owns: - endpoint resolution (static URL, per-launch function, or RemoteBrowserProvider) via an internal RemoteSessionRegistry; - the release lifecycle, with a release-at-most-once guarantee and a releaseAll() backstop on destroy() so no remote session leaks; - the maxOpenBrowsers throttle, enforced inside newPage(). Plugins keep only the library-specific connect() call, driven by a thin RemoteConnection bridge the pool injects (useRemoteConnection). The controller releases sessions by token via that bridge. This removes the two footguns from the previous design by construction: double-release (idempotent registry + token clear) and the teardown session leak (releaseAll backstop). BREAKING CHANGE: remote browsers are configured exclusively through RemoteBrowserPool passed as the crawler's browserPool option. The launchContext.remoteBrowser / connectOptions / connectOverCDPOptions options are removed; Playwright CDP-vs-WebSocket selection moves to the pool's connection.protocol option. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> feat(browser-crawler): add remoteBrowser option so the crawler builds the pool Building a RemoteBrowserPool by hand and passing it as browserPool meant constructing the browser plugin twice (the crawler builds and discards its own) and allowed mismatching the pool with the crawler — e.g. a Puppeteer-backed pool in a PlaywrightCrawler type-checked but broke at runtime. Add a remoteBrowser option (CrawlerRemoteBrowserOptions: endpoint / release / maxOpenBrowsers / connection — no plugin) to the browser crawlers. The crawler builds a RemoteBrowserPool around its OWN plugin, so the connection is always for the matching browser; there is nothing to construct and no way to mismatch. The crawler owns and tears down this pool. browserPool stays for sharing one remote pool across crawlers (not owned by the crawler); the two options are mutually exclusive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> simplify
Two BrowserLaunchError strings ended with an invisible U+200B zero-width space, which shipped in user-facing error messages and broke log grepping / string matching. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sanitizeEndpointForLog only masked userinfo credentials, so tokens passed in the query string (e.g. Browserless '?token=...') still leaked into connect-failure error messages. Drop the query and fragment entirely, keeping protocol/host/port/path for diagnostics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| BROWSERLESS_URL: http://localhost:3000 | ||
| HTTPBIN_URL: http://httpbin | ||
| CRAWLEE_DIFFICULT_TESTS: 1 | ||
| RETRY_TESTS: 1 |
There was a problem hiding this comment.
@B4nan if you have a time pls check how integration tests are wired up.
There was a problem hiding this comment.
nit: the changes in this file seem to be just indent-level formatting. perhaps we should revert this?
There was a problem hiding this comment.
good point. Did a little bit digging why this happened and it seems pnpm did this (probably due to some changes I added and removed later in PR) and it kinda ignore editorconfig. We can revert but next time it will do it again.
janbuchar
left a comment
There was a problem hiding this comment.
I like what I see! I just have a couple of design-related questions.
| --health-interval 5s | ||
| --health-timeout 5s | ||
| --health-retries 12 | ||
| httpbin: |
There was a problem hiding this comment.
Does this work reliably for you? Back in the day I set up a Standby Actor that serves httpbin to replace using the Docker container directly, but I don't remember the exact problem this was solving. This was in crawlee-python, where it later got replaced by a bespoke http testing server. But I think maybe impit still uses the Actor? @barjin?
| if (browserPool && remoteBrowser) { | ||
| throw new Error( | ||
| "Set at most one of 'browserPool' and 'remoteBrowser'. To share a remote pool across crawlers, " + | ||
| 'build a RemoteBrowserPool yourself and pass it as `browserPool`.', | ||
| ); | ||
| } |
There was a problem hiding this comment.
Is it absolutely neccessary to allow passing both? It would be much cleaner to just accept a single option, browserPool: IBrowserPool
| * {@apilink RemoteConnection}. Safe to call multiple times — the token is cleared after the first call | ||
| * and the pool's registry also dedupes, so `release()` fires at most once across close()/kill(). | ||
| */ | ||
| private async _releaseRemoteBrowser(): Promise<void> { |
There was a problem hiding this comment.
This has no business being in an abstract browser controller, the remote control would fit better into RemoteBrowserPool or something like that.
There was a problem hiding this comment.
Again, the changes in this file would fit better into a specific remote browser plugin class, not an abstract base class.
closes #1822