diff --git a/examples/deployment/docker-compose.yml b/examples/deployment/docker-compose.yml
new file mode 100644
index 00000000..5b98fa66
--- /dev/null
+++ b/examples/deployment/docker-compose.yml
@@ -0,0 +1,39 @@
+services:
+ html2rss:
+ image: html2rss/web:latest
+ env_file: .env
+
+ caddy:
+ image: caddy:2-alpine
+ depends_on:
+ - html2rss
+ command:
+ - caddy
+ - reverse-proxy
+ - --from
+ - ${CADDY_HOST}
+ - --to
+ - html2rss:3000
+ ports:
+ - "80:80"
+ - "443:443"
+ volumes:
+ - caddy_data:/data
+
+ watchtower:
+ image: containrrr/watchtower
+ depends_on:
+ - html2rss
+ - caddy
+ command:
+ - --cleanup
+ - --interval
+ - "300"
+ - html2rss
+ - caddy
+ volumes:
+ - /var/run/docker.sock:/var/run/docker.sock:ro
+ restart: unless-stopped
+
+volumes:
+ caddy_data:
diff --git a/src/components/docs/AutoGenerationOptional.astro b/src/components/docs/AutoGenerationOptional.astro
index 35039fc9..982682bb 100644
--- a/src/components/docs/AutoGenerationOptional.astro
+++ b/src/components/docs/AutoGenerationOptional.astro
@@ -2,7 +2,7 @@
import { Aside } from "@astrojs/starlight/components";
---
-
---
@@ -160,6 +160,21 @@ html2rss supports many configuration options:
4. **Check the output:** Make sure all items have titles, links, and descriptions
+### Useful CLI flags when a site is difficult
+
+Some sites need a little more request budget than the defaults.
+
+- Use `--max-redirects` when the site bounces through several canonicalization or tracking redirects before the real page loads.
+- Use `--max-requests` when your config needs more than one request, for example pagination or other follow-up fetches.
+
+```bash
+html2rss feed your-config.yml --max-redirects 10
+html2rss feed your-config.yml --max-requests 5
+html2rss auto https://example.com/blog --max-redirects 10 --max-requests 5
+```
+
+Keep these values as low as possible. If a site only needs one extra redirect, prefer `--max-redirects 4` over a much larger number.
+
## Add It To html2rss-web
Once the config works locally, add it to your `feeds.yml` or shared config repository and restart your
diff --git a/src/content/docs/getting-started.mdx b/src/content/docs/getting-started.mdx
index f08de5ba..30ea4175 100644
--- a/src/content/docs/getting-started.mdx
+++ b/src/content/docs/getting-started.mdx
@@ -1,6 +1,6 @@
---
title: "Getting Started"
-description: "Learn how to get RSS feeds from any website. Start with existing feeds or create your own in minutes."
+description: "Start html2rss-web locally, verify the web interface, generate your first feed URL, and decide when to move to custom configs."
sidebar:
order: 1
---
@@ -14,12 +14,30 @@ If you want the recommended path, go to [Run html2rss-web with Docker](/web-appl
That guide is the canonical setup flow for:
- running `html2rss-web` locally
-- confirming your first successful feed
-- deciding when to use included feeds, automatic generation, or custom configs
+- confirming the interface is working
+- generating a first feed URL
+- deciding when to use automatic generation or custom configs
## Quick Shortcuts
-- **[Run html2rss-web with Docker](/web-application/getting-started)** - Recommended first step
-- **[Browse working feed examples](/feed-directory/)** - See what success looks like
-- **[Create Custom Feeds](/creating-custom-feeds)** - Write configs when you need more control
-- **[Troubleshooting Guide](/troubleshooting/troubleshooting)** - Fix startup or extraction problems
+- **[Run html2rss-web with Docker](/web-application/getting-started)**: recommended first step
+- **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: create a feed directly from a page URL
+- **[Browse working feed examples](/feed-directory/)**: see what successful outputs look like
+- **[Create Custom Feeds](/creating-custom-feeds)**: write configs when you need more control
+- **[Troubleshooting Guide](/troubleshooting/troubleshooting)**: fix startup or extraction problems
+
+## Using the Ruby CLI
+
+If you are working directly with the gem instead of `html2rss-web`, start with:
+
+```bash
+html2rss auto https://example.com/blog
+```
+
+If the target site is unusually redirect-heavy or needs extra follow-up requests, the CLI also supports:
+
+```bash
+html2rss auto https://example.com/blog --max-redirects 10 --max-requests 5
+```
+
+For config-driven runs, the same flags are available on `html2rss feed`.
diff --git a/src/content/docs/index.mdx b/src/content/docs/index.mdx
index 349bccae..75ab9d0e 100644
--- a/src/content/docs/index.mdx
+++ b/src/content/docs/index.mdx
@@ -1,101 +1,69 @@
---
-title: "Turn Any Website Into an RSS Feed - Never Miss Updates Again"
-description: "Create RSS feeds from any website - no coding required. Turn blogs, news sites, and forums into RSS feeds you can follow in your favorite reader. Free, open source, and easy to use."
+title: "Turn Any Website Into an RSS Feed"
+description: "Run html2rss-web with Docker, open the web interface, generate stable feed URLs, and move to custom configs only when you need more control."
---
-Run `html2rss-web` with Docker, start with included feeds, and add custom configs only when you need more control.
+Run `html2rss-web` with Docker, open the web interface, and generate stable feed URLs from pages you want to follow.
-## π Get Started in 30 Seconds
+## Start Here
-**Start here:** [Run html2rss-web with Docker](/web-application/getting-started) | [Browse working feed examples](/feed-directory/)
+**Recommended path:** [Run html2rss-web with Docker](/web-application/getting-started)
-Need more control? [Write a custom feed config](/creating-custom-feeds)
+That guide is the canonical onboarding flow for:
----
+- starting a local instance
+- verifying the web interface
+- generating a first feed URL
+- deciding when to use automatic generation or custom configs
## How It Works
1. **Run your own local instance** with Docker
-2. **Use included feeds or add your own** website targets
-3. **Subscribe from your RSS reader** using stable feed URLs
-
----
-
-## Why RSS Still Matters Today
-
-**Real examples of what you can do:**
-
-- Follow your favorite blogs without social media algorithms
-- Get notified when your local news site posts about your neighborhood
-- Track job postings from multiple company websites
-- Monitor product updates from software vendors
-- Follow academic papers from your field
-
-**RSS vs Social Media:**
-
-- β **No algorithms** deciding what you see
-- β **No ads** or sponsored content
-- β **Works with any feed reader** you choose
-- β **Your data stays private**
-- β **Never miss updates** - automatic notifications
-- β **Save time** - no more manual checking
-
----
+2. **Open the web interface** and paste a page URL
+3. **Copy the feed URL into your reader**
## What is html2rss?
-html2rss is a toolkit for turning websites into RSS feeds. Think of it as a translator that converts website content into a format your feed reader can understand.
+html2rss is a toolkit for turning websites into feeds.
-**Most people should start with the web application:**
+Most people should start with the web application:
-- **π html2rss-web** - The easiest way to run your own feed server with Docker
-- **βοΈ html2rss gem** - The underlying engine, CLI, and developer interface
+- **`html2rss-web`**: the self-hosted web interface and feed server
+- **`html2rss` gem**: the Ruby engine, CLI, and lower-level config workflow
----
-
-## π― Choose Your Path
+## Choose Your Path
### I want a working instance first
-1. **[Run html2rss-web with Docker](/web-application/getting-started)** - Recommended starting path
-2. **[Browse working feed examples](/feed-directory/)** - See what success looks like
-3. **[Use the included configs](/web-application/how-to/use-included-configs/)** - Start with ready-made feeds
+1. **[Run html2rss-web with Docker](/web-application/getting-started)**: recommended starting path
+2. **[Use automatic feed generation](/web-application/how-to/use-automatic-feed-generation/)**: create a feed directly from a page URL
+3. **[Browse working feed examples](/feed-directory/)**: see what working outputs look like
### I need more control
-1. **[Creating Custom Feeds](/creating-custom-feeds)** - Write and test your own configs
-2. **[Selectors Reference](/ruby-gem/reference/selectors/)** - Learn the matching rules
-3. **[Strategy Reference](/ruby-gem/reference/strategy/)** - Use `browserless` for JS-heavy sites
+1. **[Creating Custom Feeds](/creating-custom-feeds)**: write and test your own configs
+2. **[Selectors Reference](/ruby-gem/reference/selectors/)**: learn the matching rules
+3. **[Strategy Reference](/ruby-gem/reference/strategy/)**: decide when `browserless` is justified
### I'm building or integrating
-1. **[Ruby Gem Reference](/ruby-gem/)** - Full API documentation
-2. **[Advanced Features](/ruby-gem/how-to/advanced-features/)** - Custom HTTP requests, etc.
-3. **[Contribute to Core](/get-involved/contributing/)** - Help improve the engine
-
----
-
-## π What People Are Using html2rss For
-
-- **News & Blogs:** Follow your favorite writers without social media
-- **Job Hunting:** Track job postings from multiple company sites
-- **Product Updates:** Get notified when software you use gets updated
-- **Academic Research:** Follow new papers in your field
-- **Local News:** Stay updated on your neighborhood and city
-- **Hobby Communities:** Follow forums and communities you care about
-
-[Browse all examples in our Feed Directory β](/feed-directory/)
-
----
-
-## π§ Common Issues?
+1. **[Ruby Gem Reference](/ruby-gem/)**: full API documentation
+2. **[Advanced Features](/ruby-gem/how-to/advanced-features/)**: custom HTTP requests and advanced extraction
+3. **[Contribute to Core](/get-involved/contributing/)**: help improve the engine
-**Start with Docker, not a public instance.** That gives you the most reliable path and the newest integrated behavior.
+## What People Use It For
-**Feed not working?** Check our [troubleshooting guide](/troubleshooting/troubleshooting)
+- follow blogs and news sites without social media algorithms
+- track product updates and release notes
+- monitor job postings from company websites
+- subscribe to forums and communities that do not publish feeds
+- follow local news without repeated manual checking
-**Need custom control?** Continue to [Creating Custom Feeds](/creating-custom-feeds)
+## Practical Notes
-**Need help?** Join our [community discussions](https://github.com/orgs/html2rss/discussions)
+- Start with Docker, not a public instance.
+- Use the web interface to verify the deployment first.
+- Use automatic generation for the first pass.
+- Move to custom configs when you need a stable, reviewable setup.
-**Found a bug?** [Report it on GitHub](https://github.com/html2rss/html2rss/issues)
+**Need help?** Continue to the [troubleshooting guide](/troubleshooting/troubleshooting) or join [GitHub Discussions](https://github.com/orgs/html2rss/discussions).
diff --git a/src/content/docs/ruby-gem/how-to/advanced-features.mdx b/src/content/docs/ruby-gem/how-to/advanced-features.mdx
index 703bd9e9..cf052a64 100644
--- a/src/content/docs/ruby-gem/how-to/advanced-features.mdx
+++ b/src/content/docs/ruby-gem/how-to/advanced-features.mdx
@@ -7,13 +7,13 @@ This guide covers advanced features and performance optimizations for html2rss.
## Parallel Processing
-html2rss uses parallel processing to improve performance when scraping multiple items. This happens automatically and doesn't require any configuration.
+html2rss uses parallel processing in auto-source discovery to improve performance when multiple scrapers inspect the same page. This happens automatically and doesn't require any configuration.
### How It Works
-- **Auto-source scraping:** Multiple scrapers run in parallel to analyze the page
-- **Item processing:** Each scraped item is processed in parallel
-- **Performance benefit:** Significantly faster when dealing with many items
+- **Auto-source scraping:** Multiple scrapers run in parallel to analyze the same response body
+- **Selectors and pagination:** Selector extraction and `rel="next"` pagination stay sequential and share the same request budget
+- **Performance benefit:** Faster auto-discovery without changing selector semantics
### Performance Tips
@@ -75,6 +75,8 @@ selectors:
extractor: "href"
```
+When you use the Browserless strategy, Chromium rejects transport-level headers such as `Host`, `Connection`, `Content-Length`, and `Transfer-Encoding`. html2rss filters those headers before navigation and logs the filtered header names at `info` level.
+
## Monitoring and Debugging
### Enable Debug Logging
diff --git a/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx b/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
index 33b6cca3..1a4d10bd 100644
--- a/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
+++ b/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
@@ -5,6 +5,12 @@ description: "Learn how to customize HTTP requests with custom headers, authenti
Some websites require custom HTTP headers, authentication, or other request settings to access their content. `html2rss` lets you customize requests for those cases.
+Keep this structure in mind:
+
+- `headers` stays top-level
+- `strategy` stays top-level
+- request-specific controls such as budgets and Browserless options live under `request`
+
## When You Need Custom Headers
You might need custom HTTP requests when:
@@ -35,6 +41,32 @@ selectors:
selector: "url"
```
+## Request Controls
+
+Request budgets are configured under `request`, not as top-level keys:
+
+```yaml
+headers:
+ User-Agent: "Mozilla/5.0 (compatible; html2rss/1.0)"
+request:
+ max_redirects: 5
+ max_requests: 6
+channel:
+ url: https://example.com/articles
+selectors:
+ items:
+ selector: article
+ title:
+ selector: h2
+ url:
+ selector: a
+ extractor: href
+```
+
+- `request.max_redirects` limits redirect hops
+- `request.max_requests` limits the total request budget for the feed build
+- `request.browserless.*` is reserved for Browserless-only behavior such as preload actions
+
## Common Use Cases
### API Authentication
diff --git a/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx b/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
index c0e5e379..0905835e 100644
--- a/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
+++ b/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
@@ -9,6 +9,29 @@ Some websites load their content dynamically using JavaScript. The default `html
Use the [`browserless` strategy](/ruby-gem/reference/strategy) to render JavaScript-heavy websites with a headless browser.
+Keep the strategy at the top level and put request-specific options under `request`:
+
+```yaml
+strategy: browserless
+request:
+ max_redirects: 5
+ max_requests: 6
+ browserless:
+ preload:
+ wait_for_network_idle:
+ timeout_ms: 5000
+channel:
+ url: https://example.com/app
+selectors:
+ items:
+ selector: .article
+ title:
+ selector: h2
+ url:
+ selector: a
+ extractor: href
+```
+
## When to Use Browserless
The `browserless` strategy is necessary when:
@@ -18,6 +41,56 @@ The `browserless` strategy is necessary when:
- **Infinite scroll** - Content loads as you scroll
- **Dynamic forms** - Content changes based on user interaction
+## Preload Actions
+
+For dynamic sites, rendering once is often not enough. Use `request.browserless.preload` to wait, click, or scroll before the
+HTML snapshot is taken.
+
+### Wait for JavaScript Requests
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ wait_for_network_idle:
+ timeout_ms: 4000
+```
+
+### Click "Load More" Buttons
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ click_selectors:
+ - selector: ".load-more"
+ max_clicks: 3
+ delay_ms: 250
+ wait_for_network_idle:
+ timeout_ms: 3000
+```
+
+### Scroll Infinite Lists
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ scroll_down:
+ iterations: 5
+ delay_ms: 200
+ wait_for_network_idle:
+ timeout_ms: 2500
+```
+
+These preload steps can be combined in a single config when a site needs several interactions before all items appear.
+
+If a click or scroll step causes a real navigation, html2rss returns the final document metadata, not the original page-load
+metadata. That keeps extracted relative links anchored to the rendered page.
+
## Performance Considerations
The `browserless` strategy is slower than the default `faraday` strategy because it:
diff --git a/src/content/docs/ruby-gem/reference/auto-source.mdx b/src/content/docs/ruby-gem/reference/auto-source.mdx
index 33454232..82e92df0 100644
--- a/src/content/docs/ruby-gem/reference/auto-source.mdx
+++ b/src/content/docs/ruby-gem/reference/auto-source.mdx
@@ -17,16 +17,19 @@ auto_source: {}
`auto_source` uses the following strategies to find content:
-1. **`schema`:** Parses `