Re-evaluating stable row ID implementation #6933

jackye1995 · 2026-05-25T08:10:51Z

jackye1995
May 25, 2026
Maintainer

Stable row IDs are becoming an important Lance primitive. They provide a durable logical row identifier that can survive physical rewrites such as compaction and updates.

Today, stable row IDs are implemented as a dedicated Lance system feature. The manifest tracks the stable-row-id feature flag and next_row_id. Each fragment carries row_id_meta, which points to that fragment's row ID sequence, either inline in metadata or through an external file reference. At runtime, Lance materializes a row-id-to-row-address lookup from these per-fragment sequences and caches it.

This raises an implementation question:

Could stable row IDs instead be implemented as a system-managed identity column backed by a maintained index?

This would preserve _rowid as an internal Lance system column. The question is mostly about implementation: should stable row ID remain a purpose-built storage path, or should it become a special case of a more general identity-column and write-maintained-index mechanism?

Related prior discussion: #6466

Requirements

Stable row IDs should continue to support:

stable logical row identifiers across compaction and row rewrites
efficient lookup from stable row ID to current row address
update-by-id style workflows
returning generated IDs from write operations
possibly reserving or allocating IDs before append, as discussed in Insert data with reserved stable row ids #6466
incremental refresh workflows that need durable row identity across dataset versions

The updated design should preserve these capabilities even if the internal implementation changes.

Current approach: dedicated row ID metadata

The current design has a purpose-built row ID storage path:

next_row_id is tracked in the manifest
per-fragment row ID sequences are stored through row_id_meta
those sequences may be embedded in fragment metadata or stored as external row ID files
the row ID lookup structure is custom-built from those sequences
the mechanism is specific to stable row IDs

Pros

The current model is compact and tailored to the row ID use case.

In many cases, row ID metadata can be encoded in or near the manifest, which means lookup metadata may be available without extra index IO. The row ID index is also guaranteed to be updated on insert because it is part of the write path today.

There may also be an algorithmic advantage. Stable row IDs are usually ordered and mostly sequential. A specialized row ID structure can exploit this and may support cheaper maintenance than a generic BTree. For example, some updates may be closer to O(N) instead of O(N log N).

Cons

The current model is a separate indexing path from Lance's general scalar index system. That adds long-term format and implementation surface area:

separate metadata encoding
separate lookup construction
separate caching/loading behavior
separate maintenance semantics
stable-row-id-specific operational knowledge

It may also create many fragmented row ID metadata file references over time and potentially inflate the manifest.

Another concern is behavior under frequent row-wide updates. As rows are rewritten, the row ID mapping may become less compact and less efficient. If the specialized structure degrades enough, a maintained scalar index could become more attractive despite higher baseline overhead.

Alternative approach: identity column + maintained index

The alternative is to model stable row ID as:

a system-managed identity column
automatically assigned at write time
indexed by a maintained scalar index
updated together with data writes

In this framing, stable row ID properties come from identity-column and write-maintained-index semantics:

generated values provide stable logical identifiers
update paths preserve or carry forward the identity value
the maintained index maps identity values to current row addresses
data and index updates can be committed together

Pros

This would reuse Lance's existing index infrastructure instead of maintaining a one-off row ID lookup system.

Stable row ID lookup could benefit from existing or shared mechanisms for:

index storage outside the manifest
index caching
index loading
query planning
fragment coverage
index metadata
index maintenance
index compaction/merging
existing scalar index performance knowledge

It also moves row ID metadata out of the manifest and into index files. With segmented indexes, write-time deltas could be committed as new index segments and merged later. With transaction support that can atomically commit data files and index updates, data and stable row ID index maintenance could become part of one composite write.

This design also connects stable row ID to two broader Lance features:

Identity columns

Stable row ID could be the internal version of a more general identity-column feature. That could later support user-defined identity columns, generated columns, optional start/step configuration, and explicit semantics around uniqueness, immutability, and gaps.
Write-time index maintenance

Stable row ID needs its lookup structure to remain current as data changes. Other lightweight indexes may want similar treatment. If an index is cheap enough to maintain during writes, users could choose to pay write-time cost in exchange for immediately available lookup/index behavior.

Cons

A maintained BTree may require more IO than the current design, especially for simple append-heavy workloads where row ID metadata is already available from the manifest or fragment metadata.

A generic BTree may also be less efficient than a row-id-specific structure for mostly ordered, sequential IDs. If stable row IDs have predictable ranges, a specialized range-like index may be smaller and cheaper to update than a BTree.

There is also implementation risk. Lance needs a robust mechanism for keeping indexes up to date during writes. Segmented indexes and transaction improvements make this more feasible, but the correctness story needs to be clear before stable row ID depends on it.

Dedicated sequence-like index type

The choice may not be limited to the current custom row ID metadata versus a generic BTree.

If there is an index structure that is better than BTree for mostly ordered, sequence-like data, and that structure is useful beyond _rowid, then it may be worth supporting as a dedicated Lance index type.

That could matter if Lance eventually supports general identity columns. Identity columns often produce ordered or mostly ordered values, and they may need efficient point lookup, range lookup, uniqueness checks, and write-time maintenance. A sequence-optimized index could serve both stable row IDs and future identity columns without making stable row ID a one-off special case.

In that model:

stable row ID could be a system-managed identity column
the lookup could participate in the standard index lifecycle
the physical index type could be BTree or a sequence-optimized index
write-time maintenance could use index segments and later compaction
future user-defined identity columns could reuse the same index type

This would preserve the possibility of a row-id-optimized structure while still moving toward a more general indexing abstraction.

Core questions

Should stable row ID remain a purpose-built metadata path, or should it become a system-managed identity column backed by the maintained index framework?
If it becomes index-backed, should the physical index be a generic BTree, or should Lance support a dedicated sequence-optimized index type that can also serve future identity columns?
Can write-time index maintenance provide the correctness guarantees needed for stable row ID use cases, including update-by-id, returning generated IDs, reserving IDs before append, and incremental refresh workflows?
What are the practical IO, storage, and lookup tradeoffs between the current design and an index-backed design for append-heavy workloads versus workloads with frequent row-wide updates?

majin1102 · 2026-05-28T15:35:51Z

majin1102
May 28, 2026
Collaborator

Current State

I agree with the framing in this discussion that stable row IDs are becoming an important Lance primitive, and that the question is less about whether row identity is needed and more about whether it should remain a "purpose-built storage path" or move toward a more general backend.
The current implementation also has the risks called out here: it is a "separate indexing path from Lance's general scalar index system", with dedicated metadata encoding, lookup construction, caching/loading behavior, and maintenance semantics.
In practice, this makes stable row IDs a heavy feature. Write paths that append, update, rewrite, compact, or remap rows need to preserve row_id_meta correctly.
Today, row_id_meta lives on fragments in the manifest and may point to external row ID files. However, those external files are not managed through the same systematic lifecycle as normal indexes.
This can inflate manifest/index-adjacent state over time, and it also overlaps conceptually with FRI. Both mechanisms are involved in mapping durable row identity back to current physical row locations after rewrites.
So I think the main issue is not whether stable row identity is valuable. It is whether the manifest-based row_id_meta path should remain the mechanism we continue to expand.

Proposed Direction

My proposal is a concrete version of the direction suggested by the question: "Could stable row IDs instead be implemented as a system-managed identity column backed by a maintained index?"
I would answer yes, but with a row-id-specific backend first instead of immediately requiring a generic BTree.
The discussion says one benefit of the alternative is that it "moves row ID metadata out of the manifest and into index files". I think we should do exactly that for the new implementation, while still preserving the specialized sequence-like structure that is already efficient for mostly ordered row IDs.
To keep compatibility clear, I would not change the existing _rowid behavior. The new backend can expose _row_id as the new system-managed row ID column, while old enable_stable_row_ids datasets continue to use the existing _rowid / row_id_meta path.
Keep the existing enable_stable_row_ids behavior unchanged for compatibility.
Introduce a mutually exclusive enable_row_ids mode. In this new mode:
- new appends do not write fragment.row_id_meta;
- ordinary scans can read or materialize _row_id from data-file-local information, without loading the system index;
- rewrite and compaction paths preserve _row_id as hidden physical data when needed;
- a __lance_row_id system index owns lookup/remap metadata, using the current row-id sequence structure rather than forcing a BTree up front.
This keeps data-file scan semantics separate from lookup/remap semantics:
- ordinary projection of _row_id is handled by the data file itself;
- row-id lookup/remap is handled by __lance_row_id;
- secondary indexes can continue to store row addresses internally;
- FRI continues to handle row-address remapping for secondary indexes.
The compatibility story is explicit:
- existing datasets with enable_stable_row_ids continue to use _rowid and row_id_meta;
- new datasets can opt into enable_row_ids;
- the two modes are mutually exclusive, so readers do not need to infer which semantics apply;
- old data remains readable, while new data avoids growing the manifest-level row_id_meta path further.

Roadmap

Step 1: Implement a new row ID feature independent from enable_stable_row_ids.
- This responds to the discussion's concern that stable row ID should not remain a one-off mechanism forever, while avoiding a hard cutover from the existing implementation.
- This corresponds to the index-backed direction discussed here, but it should not make secondary indexes depend on row IDs.
- Secondary indexes should continue to store row addresses.
- FRI should continue to solve row-address remapping.
- The new row ID backend should provide row ID projection and row-id lookup/remap without overlapping with FRI's responsibility.
- This gives us forward compatibility because old stable-row-id datasets keep using row_id_meta, while new datasets can opt into the new backend.
Step 2: Adapt features that currently depend on stable row IDs to the new row ID backend.
- This maps to the requirements listed in the discussion: stable logical identifiers across compaction and rewrites, efficient lookup from row ID to row address, update-by-id workflows, returning generated IDs, reserved ID allocation, row lineage, and incremental refresh workflows.
- This includes paths such as update-by-id, generated ID return, reserved ID allocation, row lineage, and any scan/take/filter behavior that currently assumes stable row IDs are backed by row_id_meta.
- The goal is to make these features depend on the row ID abstraction/backend rather than the old manifest metadata path.
Step 3: Consider gradually deprecating the old stable row ID implementation.
- This is the long-term cleanup path if the new backend proves that it can preserve the capabilities the discussion requires.
- This should only happen after the new backend covers the same required behavior and has enough compatibility coverage.
- Until then, enable_stable_row_ids should remain readable and supported.
- Deprecation can be staged because the two modes are mutually exclusive and explicitly identified by feature flags.

0 replies

jackye1995 · 2026-06-06T06:22:17Z

jackye1995
Jun 6, 2026
Maintainer Author

The "index-backed vs. inline metadata" choice is a false dichotomy

The framing in the community discussions so far has largely been: "inline row-id metadata is usually more efficient than a BTree, so an index-backed design is a regression." That's true as stated — but it frames a generic BTree as the alternative, and a BTree is the worst case for our data, not the target.

Stable row IDs are mostly-ordered and largely sequential. The current inline sequence already exploits that: it's effectively a piecewise-linear model with ε=0 (integer runs of (id_start, addr_start, len)). Choosing a generic BTree means throwing away the exact structure the inline path was built to exploit, just to gain a managed lifecycle. So "bespoke-but-optimal-for-order" vs. "generic-but-worst-for-order" is the wrong axis — we'd be falling back to the worst choice for our data shape on purpose.

The axis I'd rather evaluate: is there a general, maintained index type that keeps the best-case-when-ordered behavior of the inline sequence, but degrades to no worse than a BTree? That's a well-studied class — piecewise-linear / learned indexes — and two are directly relevant:

PGM-index (VLDB'20): linear segments under bounded error ε, recursively indexed. Provably never asymptotically worse than a B+-tree (2ε-tree bound), but on near-linear data it collapses to a handful of segments — orders of magnitude smaller. It's also the one learned index that stays smaller than a B+-tree on disk, and its dynamic variant uses an LSM/segment-merge maintenance scheme.
FITing-tree (SIGMOD'19): the intuitive version — a B+-tree whose leaves are PLA segments instead of individual keys, with per-segment insert buffers. When every key becomes its own segment, it simply is a BTree. So the "worst case == BTree" floor is structural.

Both generalize what row_id_meta already does (linear runs), but as a real index type: a self-tuning structure over the segments, a bounded search fallback, and a maintenance story — with a BTree floor rather than a BTree baseline.

I want to be honest about the risk: rigorous on-disk benchmarks (SIGMOD'23) show learned indexes are not a universal win over a tuned B+-tree for arbitrary data (B+-tree wins on p99/robustness; most learned indexes are larger on disk, PGM excepted). The advantage shows up for genuinely near-linear data — i.e. exactly the row-id / identity-column regime, not the general case. So this is a "measure it" claim, not a "trust me" claim.

Proposed next step — and why it isn't blocked on infra. Two parallel workstreams intersect here, but neither needs to gate the experiment:

Write-time index commit. Today we can't commit an index update atomically with a write — that's coming with transaction v2 (multi-statement transactions), which is an area people interested in this could help land. The benchmark doesn't need it: we can write data first, then build/retrain the index as a separate offline step, and compare that build/retrain cost directly against constructing today's row_id_meta (alongside space and lookup latency). That isolates exactly the question we care about — is an order-exploiting index cheaper to produce and query than the inline sequence — without waiting on transactional plumbing.
Pluggable secondary indexes (@Xuanwo). If that timeline lines up, the cleanest path is to implement PGM/FITing-tree as a plugin index type rather than patching core. But it also doesn't block us — the prototype doesn't need to be merged to main to benchmark; a throwaway branch is enough to get numbers.

So the concrete experiment: prototype a PGM/FITing-tree-style segment index for the row-id map (Rust crates like pgm-extra exist), build it offline over written data, and compare head-to-head against (1) today's inline metadata and (2) a plain BTree on append-heavy and update-degraded workloads — measuring build/retrain cost, space, and point-lookup latency (incl. p99). If it beats BTree on ordered data and stays ≤ BTree as the mapping degrades, it's a candidate index type for the row-id map (and anything else order-heavy). If not, we've learned the inline sequence is hard to beat and we keep it — either way we stop guessing.

On identity columns: I'd decouple this entirely. A user-facing identity / generated column is a valuable feature on its own, and whoever's interested should feel free to pursue it in parallel — it shouldn't be blocked on, or framed as, "replacing stable row ID." The shared payoff is just that if the experiment above gives us a good order-exploiting index type, both features can reuse it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluating stable row ID implementation #6933

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Re-evaluating stable row ID implementation #6933

Uh oh!

jackye1995 May 25, 2026 Maintainer

Requirements

Current approach: dedicated row ID metadata

Pros

Cons

Alternative approach: identity column + maintained index

Pros

Cons

Dedicated sequence-like index type

Core questions

Replies: 2 comments

Uh oh!

Uh oh!

majin1102 May 28, 2026 Collaborator

Current State

Proposed Direction

Roadmap

Uh oh!

jackye1995 Jun 6, 2026 Maintainer Author

The "index-backed vs. inline metadata" choice is a false dichotomy

jackye1995
May 25, 2026
Maintainer

majin1102
May 28, 2026
Collaborator

jackye1995
Jun 6, 2026
Maintainer Author