Why does Crawlee use a single persistent crawler instance in server mode instead of creating one per request? #3261

Mahendradeokar · 2025-11-21T18:51:31Z

Mahendradeokar
Nov 21, 2025

Why does the Crawlee documentation use a single long-running crawler instance with a request map when handling multiple HTTP requests on a server? Reference: https://crawlee.dev/js/docs/guides/running-in-web-server

Doc server example:

import { randomUUID } from 'node:crypto';
import { CheerioCrawler } from 'crawlee';
import { createServer } from 'http';

const requests = new Map();

const crawler = new CheerioCrawler({
    requestHandler: async ({ request, $ }) => {
        const send = requests.get(request.uniqueKey);
        send(JSON.stringify({ title: $('title').text() }));
        requests.delete(request.uniqueKey);
    },
});

createServer(async (req, res) => {
    const url = new URL(req.url, 'http://localhost:3000').searchParams.get('url');
    if (!url) {
        res.writeHead(400).end(`{"error":"missing url"}`);
        return;
    }
    const uniqueKey = randomUUID();
    requests.set(uniqueKey, (data) => res.writeHead(200).end(data));
    await crawler.addRequests([{ url, uniqueKey }]);
}).listen(3000);

Can we instead create a crawler instance inside a function, run it with a single URL, and return the scraped data? Will this cause any issues?

My version:

import { CheerioCrawler } from 'crawlee';

export async function fetchTitleOnce(url) {
    return new Promise(async (resolve) => {
        const crawler = new CheerioCrawler({
            requestHandler: ({ $, request }) => {
                resolve({ url: request.url, title: $('title').text() });
            },
        });
        await crawler.run([{ url }]);
    });
}

Answered by janbuchar

Nov 27, 2025

every instance starts cold, with no shared behavior or anti-blocking advantage, making it easier to get flagged than using a long-lived crawler?

I wouldn't worry about that too much. With a fresh instance each time, each request will be done with a new session, basically a "user profile". So you'll lose some efficiency, but anti-blocking performance should be mostly unaffected. But you still need to try, this is just an educated guess.

View full answer

janbuchar · 2025-11-24T10:08:54Z

janbuchar
Nov 24, 2025
Maintainer

Hello and thank you for your interest in Crawlee!

With some caveats, it is indeed possible to just make a fresh instance of CheerioCrawler for each request. However, your API responses will be delayed due to the inherent overhead of setting up the crawler class every time.

When you use a single crawler class, you will avoid problems with sharing the filesystem storage (extra setup will be needed to avoid that with the crawler-per-request approach). Also, it will run the request handlers using the AutoscaledPool, which will ensure efficient usage of system resources. With the crawler-per-request approach, you will probably need to disable this functionality and put some rate limiting in place.

2 replies

Mahendradeokar Nov 26, 2025
Author

Thanks @janbuchar, really appreciate the detailed response and all the work behind Crawlee, it’s a brilliant toolkit.

You mentioned that creating a crawler per request would require disabling AutoscaledPool and handling rate limiting manually. That’s exactly where my concern is.

If I go with fresh crawler instance per API call (single URL, no storage, no pooling), do I also lose Crawlee’s protections like session reuse, header rotation, adaptive delays, proxy handling? Meaning every instance starts cold, with no shared behavior or anti-blocking advantage, making it easier to get flagged than using a long-lived crawler?

Trying to understand if per-request crawler is not just slower, but also more fragile against blocking.

janbuchar Nov 27, 2025
Maintainer

every instance starts cold, with no shared behavior or anti-blocking advantage, making it easier to get flagged than using a long-lived crawler?

I wouldn't worry about that too much. With a fresh instance each time, each request will be done with a new session, basically a "user profile". So you'll lose some efficiency, but anti-blocking performance should be mostly unaffected. But you still need to try, this is just an educated guess.

Answer selected by Mahendradeokar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does Crawlee use a single persistent crawler instance in server mode instead of creating one per request? #3261

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does Crawlee use a single persistent crawler instance in server mode instead of creating one per request? #3261

Uh oh!

Mahendradeokar Nov 21, 2025

Replies: 1 comment · 2 replies

Uh oh!

janbuchar Nov 24, 2025 Maintainer

Uh oh!

Mahendradeokar Nov 26, 2025 Author

Uh oh!

janbuchar Nov 27, 2025 Maintainer

Mahendradeokar
Nov 21, 2025

Replies: 1 comment 2 replies

janbuchar
Nov 24, 2025
Maintainer

Mahendradeokar Nov 26, 2025
Author

janbuchar Nov 27, 2025
Maintainer