Skip to content

victorkjung/scrapling-dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ Scrapling Dashboard

A Docker-based control panel for Scrapling β€” run scrape jobs locally or on a VPS with one toggle.

Built with FastAPI + React. No build step. No dependencies beyond Docker.

License: MIT Powered by Scrapling Docker Ready


What It Does

This dashboard wraps Scrapling's powerful scraping engine in a web UI you can run on your machine (or a VPS) with docker compose up. You get:

  • Task cards with live progress β€” pages scraped, items found, elapsed time, and ETA countdown
  • LOCAL ↔ VPS toggle β€” switch between your machine and a remote server in one click
  • Inline results viewer β€” JSON, CSV, and table views expand below each task
  • One-click CSV export β€” download results without leaving the Tasks tab
  • Live log streaming β€” WebSocket-powered real-time output from Scrapling
  • VPS diagnostics β€” API health, Docker status, memory, and active jobs at a glance
  • All four Scrapling fetchers β€” StealthyFetcher (anti-bot), DynamicFetcher (JS), Fetcher (HTTP), AsyncFetcher (concurrent)
  • Proxy rotation support β€” configure rotating residential proxies per-job
  • Adaptive scraping β€” Scrapling's smart element tracking survives website redesigns
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser (localhost:3000)                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Sidebar   β”‚  β”‚                                β”‚  β”‚
β”‚  β”‚  ─────────│  β”‚  Task cards with progress bars β”‚  β”‚
β”‚  β”‚  [LOCAL|VPS]β”‚  β”‚  β–Έ View β†’ inline JSON/CSV/Tableβ”‚  β”‚
β”‚  β”‚  Default:   β”‚  β”‚  ⬇ CSV one-click download     β”‚  β”‚
β”‚  β”‚  VPS IP     β”‚  β”‚  Live log streaming            β”‚  β”‚
β”‚  β”‚  or Custom  β”‚  β”‚                                β”‚  β”‚
β”‚  β”‚  ─────────│  β”‚                                β”‚  β”‚
β”‚  β”‚  Job config β”‚  β”‚                                β”‚  β”‚
β”‚  β”‚  [Start]   β”‚  β”‚                                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚  REST + WebSocket
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LOCAL Docker         β”‚ OR  β”‚  VPS Docker           β”‚
β”‚  (localhost:8000)     β”‚     β”‚  (your-vps-ip:8000)   β”‚
β”‚  FastAPI + Scrapling  β”‚     β”‚  FastAPI + Scrapling   β”‚
β”‚  StealthyFetcher      β”‚     β”‚  StealthyFetcher       β”‚
β”‚  ProxyRotator         β”‚     β”‚  ProxyRotator          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

git clone https://github.com/YOUR_USERNAME/scrapling-dashboard.git
cd scrapling-dashboard
cp .env.example .env
docker compose up --build

Open http://localhost:3000 β€” that's it.

First build takes 5–10 minutes (pulls the Scrapling Docker image + installs Chromium). After that, starts in seconds.

For detailed step-by-step instructions including Docker installation, VPS setup, firewall config, and troubleshooting, see INSTALL.md.


How the VPS Toggle Works

The dashboard can send jobs to a remote server for long-running overnight crawls.

1. Set a token on your VPS

The API token is a password you create to protect your VPS endpoint. Pick any strong string:

# On your VPS, in /opt/scrapling-dashboard/.env
API_TOKEN=sk-scrapling-8f3a2b7c9d1e4f5a

For local-only use, no token is needed β€” auth is automatically skipped when the token is the default value.

2. Deploy to VPS

chmod +x deploy-vps.sh
./deploy-vps.sh root@your-vps-ip

3. Connect from the dashboard

  1. Toggle LOCAL β†’ VPS in the header
  2. Your default VPS IP is pre-configured (or click Custom IP to enter another)
  3. Paste your token in the API bearer token field
  4. Submit jobs β€” they run on the VPS even if you close your laptop

API Reference

Method Endpoint Description
GET /api/health Health check
POST /api/jobs Submit a scrape job
POST /api/spiders Submit a multi-page crawl
GET /api/jobs List all jobs
GET /api/jobs/{id} Job status + progress
GET /api/jobs/{id}/results?format=json Results (json or csv)
DELETE /api/jobs/{id} Delete a job
WS /ws/jobs/{id}/logs Live log stream

Quick example

curl -X POST http://localhost:8000/api/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://quotes.toscrape.com",
    "selectors": ".quote",
    "fetcher": "stealthy",
    "headless": true
  }'

Project Structure

scrapling-dashboard/
β”œβ”€β”€ docker-compose.yml        # Orchestrates backend + frontend
β”œβ”€β”€ .env.example              # Environment template (copy to .env)
β”œβ”€β”€ deploy-vps.sh             # One-command VPS deploy script
β”œβ”€β”€ INSTALL.md                # Detailed installation guide
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ Dockerfile            # Based on pyd4vinci/scrapling
β”‚   β”œβ”€β”€ app.py                # FastAPI wrapping Scrapling
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   └── index.html            # React SPA (no build step needed)
└── nginx/
    └── default.conf          # Reverse proxy + WebSocket support

Fetcher Guide

Fetcher Use Case Anti-bot Speed
StealthyFetcher Cloudflare, Turnstile, protected sites β˜…β˜…β˜…β˜…β˜… β˜…β˜…β˜…
DynamicFetcher JS-heavy SPAs, client-rendered pages β˜…β˜…β˜… β˜…β˜…β˜…
Fetcher Simple HTTP, static pages β˜…β˜… β˜…β˜…β˜…β˜…β˜…
AsyncFetcher High-volume concurrent scraping β˜…β˜… β˜…β˜…β˜…β˜…β˜…

When to Use LOCAL vs VPS

Scenario Recommendation
Development / testing selectors LOCAL
Quick scrapes (< 100 pages) LOCAL
Sites with aggressive anti-bot LOCAL + residential proxy
Overnight crawls (1000+ pages) VPS
Scheduled recurring jobs VPS
Scraping from a specific region VPS in target region + proxy

Extending

  • Persistence β€” Replace the in-memory job store in app.py with SQLite or Redis
  • Scheduling β€” Add APScheduler or cron for recurring spiders
  • Notifications β€” POST to Slack/Discord webhooks on job completion
  • Auth UI β€” Add a login page if exposing the VPS publicly
  • HTTPS β€” Use Caddy as a reverse proxy for automatic TLS (see INSTALL.md)

Credits & Acknowledgments

Scrapling

This project is a UI wrapper around Scrapling by Karim Shoair (D4Vinci). Scrapling is an adaptive web scraping framework that handles everything from single requests to full-scale crawls, with built-in anti-bot bypass, smart element tracking, and proxy rotation.

Dashboard

This dashboard was designed and built with the assistance of Claude by Anthropic, iteratively scaffolding the FastAPI backend, React frontend, Docker configuration, and VPS deployment tooling.

Core Dependencies


License

This dashboard is MIT licensed. Scrapling itself is BSD-3-Clause.


Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit changes (git commit -am 'Add my feature')
  4. Push (git push origin feature/my-feature)
  5. Open a Pull Request

About

This project is a UI wrapper around Scrapling by Karim Shoair (D4Vinci). Scrapling is an adaptive web scraping framework that handles everything from single requests to full-scale crawls, with built-in anti-bot bypass, smart element tracking, and proxy rotation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors