Proposal: Store Checkpoints In Per-Checkpoint Git Refs

## Problem

Entire currently stores committed checkpoints in one append-only Git branch, `entire/checkpoints/v1`. That has a few problems:

- The Git repository keeps growing without clear cleanup boundaries.
- GitHub treats `entire/checkpoints/v1` like a normal branch and may show “open a pull request” prompts after checkpoint pushes.
- In new repositories, GitHub has sometimes selected `entire/checkpoints/v1` as the default branch because no other branch existed yet.
- Pushes are broader than necessary because one branch represents all checkpoint history.

The proposed direction is to store each checkpoint under its own Git ref.

## Related Work

This should build on the checkpoint-store refactor tracked in https://github.com/entireio/cli/issues/1433.

Ideally the pluggable store boundary lands before the ref-store work starts. In practice, this work may happen in parallel across 2-4 people. The ref-store design should follow the direction of that issue and avoid introducing a second abstraction that would need to be reconciled later.

The migration approach can reuse ideas from https://github.com/entireio/cli/pull/1397, but production checkpoint refs should point to commits, not raw tree objects.

## Storage Model

Each checkpoint gets one stable ref: `refs/entire/checkpoints/<shard>/<checkpoint-id>`.

The ref points to a checkpoint commit containing the checkpoint tree. The first commit for a migrated checkpoint can be an orphan commit that reuses the existing `entire/checkpoints/v1` subtree. Later updates can advance the same ref and preserve per-checkpoint history.

The ref should not point directly to a raw tree object. Commits work better with existing Git tooling and let us keep using Git commit signing.

For new CLI versions, the ref store is authoritative. If a checkpoint is written to refs, the CLI should read it from refs so ref-store issues are visible during rollout.

## Metadata Versioning

Root checkpoint metadata should include both `cli_version` and `checkpoints_version`.

`cli_version` records the CLI that wrote the checkpoint. `checkpoints_version` records the checkpoint storage format, for example `refs-1`. Session metadata does not need a separate checkpoint storage version.

## Checkpoint IDs

Existing checkpoint IDs cannot change because they are already referenced from Git history. Legacy IDs should keep the current prefix-shard layout. It would be possible to create a commit/checkpoint mapping using ref structure but that would add additional complexity.

Future checkpoint IDs are expected to be [ULIDs](https://github.com/ulid/spec). New ULID refs should use the last two ULID characters as the shard, for example `refs/entire/checkpoints/ZN/01KVBJCWYA4YW6J5M9GP655HZN`. This is to ensure an even spread of shards across checkpoints while keeping the checkpoint IDs lexicographically sortable.

The ref resolver should support both legacy IDs and ULIDs for the time being.

## Rollout Control

Rollout should use `strategy_options`, consistent with existing settings in `.entire/settings.json`.

A possible setting is:

```json
{
  "strategy_options": {
    "checkpoint_store": "refs-v1-mirror"
  }
}
```

Expected modes:

- unset: current `entire/checkpoints/v1` behavior
- `refs-v1-mirror`: refs are authoritative, and `entire/checkpoints/v1` is still written as a temporary compatibility mirror
- `refs-v1`: refs only, after the ref store has proven stable

The exact names can change, but rollout mode should stay separate from checkpoint metadata versioning.

## Initial Rollout

During the initial rollout, the CLI should dual-write:

- authoritative per-checkpoint refs
- a temporary `entire/checkpoints/v1` compatibility mirror

This is based on prior rollout experience where some repositories moved to a new storage mode too early and reverse-migration tooling had to be built afterwards to avoid data loss.

The `entire/checkpoints/v1` mirror should be removed once the ref store has proven stable.

## Migration And Backfill


The first version should create one commit per existing checkpoint by reusing the checkpoint subtree from `entire/checkpoints/v1`. It does not need to reconstruct the full `entire/checkpoints/v1` update history.

Migration should preserve useful ordering and timestamps where possible, because some listing flows sort by checkpoint or commit time.

Migration commits should use normal checkpoint signing behavior by default. The operator’s Git identity should be the default identity. The migration command may also allow an explicit author, such as `Entire Checkpoint Migration <checkpoints@entire.io>`, and an explicit signing key via `--signing-key`. If we wanted to simplify the migration command, skipping checkpoint signing completely would also be an option.

Migration should signal refs for push through the same mechanism as normal checkpoint writes.

## Push Discovery

The CLI should not push every local checkpoint ref. That would push refs fetched for local reads, and deleting local refs after push would make normal local workflows worse.

Instead, checkpoint writes should enqueue refs that need to be pushed. A simple flock-protected JSONL queue in the Git common directory is enough for the first version. Batch pushes should be used so migrated or newly written refs are not pushed one by one.

## Reads And Listings

Branch-scoped flows should continue to use code commit history (as opposed to checkpoint commit history) to find relevant checkpoint IDs, then resolve those IDs to checkpoint refs.

If needed refs are missing locally, the CLI should fetch those refs directly and efficiently rather than asking users to run Git commands.

Storage-level cleanup and maintenance operations can list local checkpoint refs only for now. No local checkpoint index is needed in the first version.

## Out Of Scope

This proposal does not decide:

- exact retry or partial-failure behavior
- queue compaction or repair mechanics
- remote pruning behavior
- local indexing
- detailed implementation sequencing
- session-level refs
- imported-transcript association refs

## Future Extensions

Later work may add refs such as `refs/entire/sessions/...` and `refs/entire/commits/...`.

Session refs could expose session-level checkpoint data directly. Commit association refs could link imported transcripts to existing code commits without rewriting Git history.

Those are useful future directions, but they should not complicate the checkpoint-ref rollout.

## Review Checklist

Before moving this proposal into implementation planning, we should confirm that it addresses the problems with the current monolithic branch design:

- Checkpoint storage has cleanup boundaries instead of one ever-growing branch.
- GitHub no longer treats `entire/checkpoints/v1` as a normal branch that invites pull requests.
- New repositories cannot accidentally get `entire/checkpoints/v1` selected as the default branch.
- Ref storage is authoritative during rollout, so new-storage bugs are visible.
- The temporary `entire/checkpoints/v1` mirror exists only for rollback and downgrade safety.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Store Checkpoints In Per-Checkpoint Git Refs #1471

Problem

Related Work

Storage Model

Metadata Versioning

Checkpoint IDs

Rollout Control

Initial Rollout

Migration And Backfill

Push Discovery

Reads And Listings

Out Of Scope

Future Extensions

Review Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Store Checkpoints In Per-Checkpoint Git Refs #1471

Description

Problem

Related Work

Storage Model

Metadata Versioning

Checkpoint IDs

Rollout Control

Initial Rollout

Migration And Backfill

Push Discovery

Reads And Listings

Out Of Scope

Future Extensions

Review Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions