pgbench: Add --init-batch-size with automatic retry on disconnect #820

dcw-data · 2025-12-03T22:02:02Z

Add a new --init-batch-size option to pgbench initialization mode that enables batched commits and automatic reconnection/retry on errors.

The --init-batch-size parameter accepts a value in scale units (same as --scale). When specified, the initialization process commits after each batch instead of using one large transaction. This allows pgbench to recover from network disconnections or other transient errors during long-running initializations that can take hours for multi-TB datasets.

Key features:

Batched commits: Each batch commits independently, preserving work
Automatic retry: On error, pgbench reconnects and retries the current batch up to 5 times before giving up
Progress tracking: Allows dozens of retries over the entire run as long as progress is made between retries
Error reporting: Reports total errors and retries at completion
COPY FREEZE disabled: When batching is enabled, COPY FREEZE is automatically disabled since it requires a transaction that created the table

The implementation:

Modifies initPopulateTable() to accept a connection pointer and handle batching/retry logic
Updates initGenerateDataClientSide() to support both batched and non-batched modes
Maintains cancel handler (SetCancelConn/ResetCancelConn) across reconnections
Adds global counters for num_init_errors and num_init_retries

Also adds TAP tests to verify the retry behavior works correctly when connections are terminated during initialization.

Example usage:
pgbench -i --scale=1000 --init-batch-size=10 dbname

This commits every 10 scale units (1 million accounts rows per batch).

Add a new --init-batch-size option to pgbench initialization mode that enables batched commits and automatic reconnection/retry on errors. The --init-batch-size parameter accepts a value in scale units (same as --scale). When specified, the initialization process commits after each batch instead of using one large transaction. This allows pgbench to recover from network disconnections or other transient errors during long-running initializations that can take hours for multi-TB datasets. Key features: - Batched commits: Each batch commits independently, preserving work - Automatic retry: On error, pgbench reconnects and retries the current batch up to 5 times before giving up - Progress tracking: Allows dozens of retries over the entire run as long as progress is made between retries - Error reporting: Reports total errors and retries at completion - COPY FREEZE disabled: When batching is enabled, COPY FREEZE is automatically disabled since it requires a transaction that created the table The implementation: - Modifies initPopulateTable() to accept a connection pointer and handle batching/retry logic - Updates initGenerateDataClientSide() to support both batched and non-batched modes - Maintains cancel handler (SetCancelConn/ResetCancelConn) across reconnections - Adds global counters for num_init_errors and num_init_retries Also adds TAP tests to verify the retry behavior works correctly when connections are terminated during initialization. Example usage: pgbench -i --scale=1000 --init-batch-size=10 dbname This commits every 10 scale units (1 million accounts rows per batch).

dcw-data assigned mmeent-databricks and dcw-data and unassigned mmeent-databricks Dec 3, 2025

dcw-data requested review from bayandin and mmeent-databricks December 3, 2025 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pgbench: Add --init-batch-size with automatic retry on disconnect #820

pgbench: Add --init-batch-size with automatic retry on disconnect #820

Uh oh!

dcw-data commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pgbench: Add --init-batch-size with automatic retry on disconnect #820

Are you sure you want to change the base?

pgbench: Add --init-batch-size with automatic retry on disconnect #820

Uh oh!

Conversation

dcw-data commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants