RLBidder

Reinforcement learning auto-bidding library for research and production.

Overview • Who Should Use This • Installation • Quickstart • Benchmarks • API • Citation

📖 Overview

rlbidder is a comprehensive toolkit for training and deploying reinforcement learning agents in online advertising auctions. Built for both industrial scale and research agility, it provides:

Complete offline RL pipeline: Rust-powered data processing (Polars) → SOTA algorithms (IQL, CQL, DT, GAVE, GAS, QGA, CBD) → parallel evaluation
Modern ML infrastructure: PyTorch Lightning multi-GPU training, experiment tracking, automated reproducibility
Production insights: Interactive dashboards for campaign monitoring, market analytics, and agent behavior analysis
Research rigor: Statistically robust benchmarking with RLiable metrics, tuned control baselines, and round-robin evaluation

Whether you're deploying bidding systems at scale or researching novel RL methods, rlbidder bridges the gap between academic innovation and production readiness.

🎯 Who Should Use rlbidder?

Researchers looking to experiment with SOTA offline RL algorithms (IQL, CQL, DT, GAVE, GAS, QGA, CBD) on realistic auction data with rigorous benchmarking.

AdTech Practitioners comparing RL agents against classic baselines (PID, BudgetPacer) before production deployment.

🚀 Key Features & What Makes rlbidder Different

rlbidder pushes beyond conventional RL libraries by integrating cutting-edge techniques from both RL research and modern LLM/transformer architectures. Here's what sets it apart:

Rust-Powered Data Pipeline

Standardized workflow: Scan Parquet → RL Dataset → Feature Engineering → DT Dataset with reproducible artifacts at every stage
Polars Lazy API: Streaming data processing with a blazingly fast Rust engine that handles massive datasets without memory overhead
Scalable workflows: Process 100GB+ auction logs efficiently with lazy evaluation and zero-copy operations
Feature engineering: Drop-in scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward) for states, actions, and rewards

State-of-the-Art RL Algorithms

Comprehensive baselines: Classic control (BudgetPacer, PID) and learning-based methods (BC, CQL, IQL, DT, GAVE, GAS, QGA, CBD)
HL-Gauss Distributional RL: Smooth Gaussian-based distributional Q-learning for improved uncertainty quantification, advancing beyond standard categorical approaches
Efficient ensemble critics: Leverage torch.vmap for vectorized ensemble operations—much faster than traditional loop-based implementations
Numerically stable stochastic policies: DreamerV3-style SigmoidRangeStd and TorchRL-style BiasedSoftplus to avoid numerical instabilities from exp/log operations
Diffusion-based bidding (CBD): Causal auto-Bidding via Diffusion with Flow Matching scheduler and 1D UNet architecture for generative action modeling

Modern Transformer Stack (LLM-Grade)

FlashAttention (SDPA): Uses latest PyTorch scaled dot-product attention API for accelerated training
RoPE positional encoding: Rotary positional embeddings for improved sequence length generalization, adopted from modern LLMs
QK-Norm: Query-key normalization for enhanced training stability at scale
SwiGLU: Advanced feed-forward networks for superior expressiveness
Efficient inference: DTInferenceBuffer with deque-based temporal buffering for online Decision Transformer deployment

Simulated Online Evaluation & Visualization

Parallel evaluation: Multi-process evaluators with pre-loaded data per worker—much faster than sequential benchmarking
Robust testing: Round-robin agent rotation with multi-seed evaluation for statistically reliable comparisons
Tuned competitors: Classic control methods (BudgetPacer, PID) with optimized hyperparameters as baselines
Interactive dashboards: Production-ready Plotly visualizations with market structure metrics (HHI, Gini, volatility) and RLiable metrics
Industrial analytics: Campaign health monitoring, budget pacing diagnostics, auction dynamics, and score distribution analysis

Modern ML Engineering Stack

Modular design: Enables both production readiness and rapid prototyping
PyTorch Lightning: Reduce boilerplate code, automatic mixed precision, gradient accumulation
Draccus configuration: Type-safe dataclass-to-CLI with hierarchical configs, dot-notation overrides, and zero boilerplate
Local experiment tracking: Trackio for experiment management without external cloud dependencies

Comparison with AuctionNet

Feature	AuctionNet	rlbidder
Data Engine	Pandas	Polars Lazy (Rust) ✨
Configuration	argparse	Draccus (dataclass-to-CLI) ✨
Distributional RL	❌	HL-Gauss ✨
Ensemble Method	❌	torch.vmap ✨
Transformer Attention	Standard	SDPA/FlashAttn ✨
Positional Encoding	Learned	RoPE ✨
Policy Stability	exp(log_std)	SigmoidRangeStd/BiasedSoftplus ✨
Parallel Evaluation	❌	ProcessPool + Round-robin ✨
Diffusion Models	❌	Flow Matching + UNet1D ✨
Visualization	❌	Production Dashboards ✨

📊 Benchmarking Results

We evaluate all agents using rigorous statistical methods across multiple delivery periods with round-robin testing and multi-seed evaluation. The evaluation protocol follows RLiable best practices for statistically reliable algorithm comparison.

Mean Returns (per delivery period)

Agent	P14	P15	P16	P17	P18	P19	P20	Avg
CBD	40.65	38.97	38.53	41.15	39.38	43.60	39.33	40.23
IQL	41.96	39.35	38.65	40.33	39.60	41.48	39.38	40.11
CQL	40.46	39.58	38.61	38.60	40.28	41.18	39.20	39.70
QGA	39.39	39.66	37.42	38.58	38.65	40.97	39.19	39.12
GAS	39.08	39.17	36.92	38.45	38.73	41.19	39.10	38.95
BC	37.37	36.22	36.11	36.54	38.53	40.49	36.62	37.41
DT	36.15	33.76	34.60	35.48	33.86	39.42	34.07	35.33
PIDBudgetPacer	34.37	34.35	33.92	34.33	33.39	37.55	34.26	34.59
FTRL	33.99	34.76	32.74	35.04	33.36	38.51	33.15	34.51
BudgetPacer	32.88	34.56	34.00	35.24	32.49	35.68	33.89	34.10
ValueScaledCPA	30.35	30.00	30.58	30.05	28.20	33.62	28.86	30.24

Mean Score (per delivery period)

Agent	P14	P15	P16	P17	P18	P19	P20	Avg
CBD	35.44	33.48	33.21	35.99	34.28	39.81	33.88	35.16
IQL	37.10	33.87	32.77	34.48	33.91	36.46	33.22	34.54
QGA	35.47	35.21	32.36	33.45	34.45	36.86	33.88	34.52
GAS	34.60	34.70	32.47	33.71	33.54	36.75	33.63	34.20
CQL	35.09	34.12	33.16	32.68	34.77	36.20	33.06	34.15
DT	33.49	32.01	31.98	32.76	31.72	37.16	31.49	32.95
BC	32.83	32.32	31.68	31.57	33.33	36.01	32.36	32.87
FTRL	30.13	32.22	29.78	31.97	30.33	35.33	30.10	31.41
PIDBudgetPacer	29.12	30.51	30.05	30.15	28.93	33.39	29.57	30.24
ValueScaledCPA	28.56	28.29	28.61	28.11	26.32	31.27	27.05	28.32
BudgetPacer	26.23	27.86	27.53	28.82	25.83	29.53	27.62	27.63

Score Distribution Analysis Violin plots showing performance distributions across agents and seeds.	Mean Performance Comparison Aggregated performance metrics with confidence intervals.
RLiable Statistical Metrics Performance profiles and aggregate metrics following RLiable best practices.

📈 Interactive Dashboards & Gallery

Beyond raw performance metrics, rlbidder helps you understand why agents behave the way they do. Production-grade interactive dashboards summarize policy behavior, campaign health, and auction dynamics for both research insights and production monitoring.

Auction market analysis Market concentration, volatility, and competitiveness.	Campaign analysis (CQL) Segment-level delivery quality and conversion outcomes.
Budget pacing (CQL) Daily spend pacing and CPA stabilization diagnostics.	Auction metrics scatterplots Spend, conversion, ROI, and win-rate trade-offs.

🚀 Getting Started

Installation

Prerequisites

Python 3.11 or newer
PyTorch 2.6 or newer (follow PyTorch install guide)
GPU with 8GB+ vRAM recommended for training

Local Development

git clone https://github.com/zuoxingdong/rlbidder.git
cd rlbidder
pip install -e .

Quickstart

Follow the steps below to reproduce the full offline RL workflow on processed campaign data.

Step 1: Data Preparation

# Download sample competition data (periods 7-8 and trajectory 1)
bash scripts/download_raw_data.sh -p 7-8,traj1 -d data/raw

# Convert raw CSV to Parquet (faster I/O with Polars)
python scripts/convert_csv_to_parquet.py --raw_data_dir=data/raw

# Build evaluation-period parquet files
python scripts/build_eval_dataset.py --data_dir=data

# Create training transitions (choose ONE of the two modes below)
# Option A – directly from raw trajectory (pre-aggregated state)
python scripts/build_transition_dataset.py --data_dir=data --mode=trajectory
# Option B – from recovered tick-level data (recommended, extensible)
python scripts/recover_tick_features.py --data_dir=data
python scripts/build_transition_dataset.py --data_dir=data --mode=recovered_tick

# Fit scalers for state, action, and reward normalization
python scripts/scale_transitions.py --data_dir=data --output_dir=scaled_transitions

# Generate Decision Transformer trajectories with return-to-go
python scripts/build_dt_dataset.py \
  --build.data_dir=data \
  --build.reward_type=reward_dense \
  --build.use_scaled_reward=true

What you'll have: Preprocessed datasets in data/processed/ and fitted scalers in data/scaled_transitions/ ready for training.

Step 2: Train Agents

# Train IQL (Implicit Q-Learning) - value-based offline RL
python examples/train_iql.py \
  --model_cfg.lr_actor 3e-4 \
  --model_cfg.lr_critic 3e-4 \
  --model_cfg.num_q_models 5 \
  --model_cfg.bc_alpha 0.01 \
  --train_cfg.enable_trackio_logger=False

# Train DT (Decision Transformer) - sequence modeling for RL
python examples/train_dt.py \
  --model_cfg.embedding_dim 512 \
  --model_cfg.num_layers 6 \
  --model_cfg.lr 1e-4 \
  --model_cfg.rtg_scale 98 \
  --model_cfg.target_rtg 2.0 \
  --train_cfg.enable_trackio_logger=False

What you'll have: Trained model checkpoints in examples/checkpoints/ with scalers and hyperparameters.

💡 Configuration powered by draccus: All training scripts use type-safe dataclass configs with automatic CLI generation. Override any nested config with dot-notation (e.g., --model_cfg.lr 1e-4) or pass config files directly.

💡 Track experiments with Trackio: All training scripts automatically log metrics, hyperparameters, and model artifacts to Trackio (a local experiment tracker). Launch the Trackio dashboard to visualize training progress:

trackio show

Then open the displayed URL in your browser to explore training curves, compare runs, and analyze hyperparameter configurations.

Step 3: Evaluate in Simulated Auctions

# Evaluate IQL agent with parallel multi-seed evaluation
python examples/evaluate_agents.py \
  --evaluation.data_dir=data \
  --evaluation.evaluator_type=OnlineCampaignEvaluator \
  --evaluation.delivery_period_indices=[7,8] \
  --evaluation.num_seeds=5 \
  --evaluation.num_workers=8 \
  --evaluation.output_dir=examples/eval \
  --agent.agent_class=IQLBiddingAgent \
  --agent.model_dir=examples/checkpoints/iql \
  --agent.checkpoint_file=best.ckpt

What you'll have: Evaluation reports, campaign summaries, and auction histories in examples/eval/ ready for visualization.

Batch evaluation: To evaluate multiple agents in one run, use examples/evaluate_all_agents.py (edit the AGENT_CLASSES list in the script to select which agents to run).

Next steps: Generate dashboards with examples/performance_visualization.ipynb or explore the evaluation results with Polars DataFrames.

📦 Module Guide

Each module handles a specific aspect of the RL bidding pipeline:

Module	Description	Key Classes/Functions
📚 `rlbidder.agents`	Offline RL agents and control baselines	`IQLModel`, `CQLModel`, `DTModel`, `GAVEModel`, `GASModel`, `QGAModel`, `CBDModel`, `BudgetPacerBiddingAgent`
🔧 `rlbidder.data`	Data processing, scalers, and datasets	`OfflineDataModule`, `OfflineDTDataModule`, `OfflineDDDataModule`, `TrajDataset`, `SymlogTransformer`, `WinsorizerTransformer`
🏪 `rlbidder.envs`	Auction simulation and value sampling	`AuctionSimulator`, `ValueSampler`, `sample_pValues_and_conversions_scipy`
🎯 `rlbidder.evaluation`	Multi-agent evaluation and metrics	`ParallelOnlineCampaignEvaluator`, `OnlineCampaignEvaluator`, `OfflineCampaignEvaluator`
🧠 `rlbidder.models`	Neural network building blocks	`StochasticActor`, `EnsembledQNetwork`, `NormalHead`, `HLGaussLoss`, `CBDDiffuser`, `FlowMatchingScheduler`
📊 `rlbidder.viz`	Interactive dashboards and analytics	`plot_campaign_simulation_metrics`, `plot_auction_market_overview`, `plot_interval_estimates`
🛠️ `rlbidder.utils`	Utilities and helpers	`generate_seeds`, `regression_report`, `_metric_to_float`

🏗️ Architecture

The library follows a modular design with clear separation of concerns. Data flows from raw logs through preprocessing, training, and evaluation to final visualization:

flowchart TD
    subgraph Data["📦 Data Pipeline"]
        direction TB
        raw["Raw Campaign Data<br/><i>CSV/Parquet logs</i>"]
        scripts["Build Scripts<br/>convert • build_eval<br/>build_transition • scale"]
        artifacts["📁 Preprocessed Artifacts<br/>processed/ • scaled_transitions/<br/><i>Parquet + Scalers</i>"]
        
        raw -->|transform| scripts
        scripts -->|generate| artifacts
    end

    subgraph Core["⚙️ Core Library Modules"]
        direction TB
        data_mod["<b>rlbidder.data</b><br/>OfflineDataModule<br/>TrajDataset • ReplayBuffer<br/>🔧 <i>Handles batching & scaling</i>"]
        models["<b>rlbidder.models</b><br/>StochasticActor • EnsembledQNetwork<br/>ValueNetwork • Losses • Optimizers<br/>🧠 <i>Agent building blocks</i>"]
        agents["<b>rlbidder.agents</b><br/>IQLModel • CQLModel • DTModel<br/>📚 <i>LightningModule implementations</i>"]
        
        agents -->|composes| models
    end

    subgraph Training["🔥 Training Pipeline"]
        direction TB
        train["<b>examples/train_iql.py</b><br/>🎛️ Config + CLI<br/><i>Orchestration script</i>"]
        trainer["⚡ Lightning Trainer<br/>fit() • validate()<br/><i>Multi-GPU support</i>"]
        ckpt["💾 Model Checkpoints<br/>best.ckpt • last.ckpt<br/><i>+ scalers + hparams</i>"]
        
        train -->|instantiates| data_mod
        train -->|instantiates| agents
        train -->|launches| trainer
        trainer -->|saves| ckpt
    end

    subgraph Eval["🎯 Online Evaluation"]
        direction TB
        evaluator["<b>rlbidder.evaluation</b><br/>OnlineCampaignEvaluator<br/>ParallelEvaluator<br/>🔄 <i>Multi-seed, round-robin</i>"]
        env["<b>rlbidder.envs</b><br/>Auction Simulator<br/>🏪 <i>Multi-agent market</i>"]
        results["📈 Evaluation Results<br/>Campaign Reports • Agent Summaries<br/>Auction Histories<br/><i>Polars DataFrames</i>"]
        
        evaluator -->|simulates| env
        env -->|produces| results
    end

    subgraph Viz["📊 Visualization & Analysis"]
        direction TB
        viz["<b>rlbidder.viz</b><br/>Plotly Dashboards<br/>Market Metrics<br/>🎨 <i>Interactive HTML</i>"]
        plots["📉 Production Dashboards<br/>Campaign Health • Market Structure<br/>Budget Pacing • Scatter Analysis"]
        
        viz -->|renders| plots
    end

    artifacts ==>|loads| data_mod
    artifacts -.->|eval data| evaluator
    ckpt ==>|load_from_checkpoint| evaluator
    results ==>|consumes| viz

    classDef dataStyle fill:#1565c0,stroke:#0d47a1,stroke-width:3px,color:#fff,font-weight:bold
    classDef coreStyle fill:#ef6c00,stroke:#e65100,stroke-width:3px,color:#fff,font-weight:bold
    classDef trainStyle fill:#6a1b9a,stroke:#4a148c,stroke-width:3px,color:#fff,font-weight:bold
    classDef evalStyle fill:#2e7d32,stroke:#1b5e20,stroke-width:3px,color:#fff,font-weight:bold
    classDef vizStyle fill:#c2185b,stroke:#880e4f,stroke-width:3px,color:#fff,font-weight:bold
    
    class Data,raw,scripts,artifacts dataStyle
    class Core,data_mod,models,agents coreStyle
    class Training,train,trainer,ckpt trainStyle
    class Eval,evaluator,env,results evalStyle
    class Viz,viz,plots vizStyle

Design Principles:

🔌 Modular - Each component is independently usable and testable
⚡ Scalable - Polars + Lightning enable massive datasets and efficient training
🔄 Reproducible - Deterministic seeding, configuration management, and evaluation
🚀 Production-ready - Type hints, error handling, logging, and monitoring built-in

🤝 Contributing

🌟 Star the repo if you find it useful
🔀 Fork and submit PRs for bug fixes or new features
📝 Improve documentation and add examples
🧪 Add tests for new functionality

🌟 Acknowledgments

rlbidder builds upon ideas from:

AuctionNet original pioneer, for auction environment and benchmark design
PyTorch Lightning for training infrastructure
Draccus for elegant dataclass-to-CLI configuration management
TRL & Transformers for modern transformer implementations
Polars for high-performance data processing
PyTorch RL for RL algorithm

📝 Citation

If you use rlbidder in your work, please cite it using the BibTeX entry below.

@misc{zuo2025rlbidder,
  author = {Zuo, Xingdong},
  title = {RLBidder: Reinforcement learning auto-bidding library for research and production},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zuoxingdong/rlbidder}}
}

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
examples		examples
rlbidder		rlbidder
scripts		scripts
tests/test_state_universe		tests/test_state_universe
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLBidder

📖 Overview

🎯 Who Should Use rlbidder?

🚀 Key Features & What Makes rlbidder Different

Rust-Powered Data Pipeline

State-of-the-Art RL Algorithms

Modern Transformer Stack (LLM-Grade)

Simulated Online Evaluation & Visualization

Modern ML Engineering Stack

Comparison with AuctionNet

📊 Benchmarking Results

Mean Returns (per delivery period)

Mean Score (per delivery period)

📈 Interactive Dashboards & Gallery

🚀 Getting Started

Installation

Prerequisites

Local Development

Quickstart

Step 1: Data Preparation

Step 2: Train Agents

Step 3: Evaluate in Simulated Auctions

📦 Module Guide

🏗️ Architecture

🤝 Contributing

🌟 Acknowledgments

📝 Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RLBidder

📖 Overview

🎯 Who Should Use rlbidder?

🚀 Key Features & What Makes rlbidder Different

Rust-Powered Data Pipeline

State-of-the-Art RL Algorithms

Modern Transformer Stack (LLM-Grade)

Simulated Online Evaluation & Visualization

Modern ML Engineering Stack

Comparison with AuctionNet

📊 Benchmarking Results

Mean Returns (per delivery period)

Mean Score (per delivery period)

📈 Interactive Dashboards & Gallery

🚀 Getting Started

Installation

Prerequisites

Local Development

Quickstart

Step 1: Data Preparation

Step 2: Train Agents

Step 3: Evaluate in Simulated Auctions

📦 Module Guide

🏗️ Architecture

🤝 Contributing

🌟 Acknowledgments

📝 Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages