Skip to content

Generalize table storage and scans#614

Open
aheev wants to merge 8 commits into
LadybugDB:mainfrom
aheev:std-storage
Open

Generalize table storage and scans#614
aheev wants to merge 8 commits into
LadybugDB:mainfrom
aheev:std-storage

Conversation

@aheev

@aheev aheev commented Jun 23, 2026

Copy link
Copy Markdown
Member
  • add createScanState
  • get rid of arrowTable::getNumBatches
  • rm arrow/icebug-disk dynamic_cast checks wherever possible

@aheev aheev marked this pull request as ready for review June 23, 2026 14:45
@aheev

aheev commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

@adsharma could you PTAL?

@aheev

aheev commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

failing test is unrelated:

EXPECT OK BUT GOT ERROR: IO exception: Failed to create directory /home/runner/.lbdb/extension/0.17.0/linux_amd64/ due to: IO exception: Directory /home/runner/.lbdb/extension/0.17.0/linux_amd64 cannot be created. Check if it exists and remove it.

@aheev

aheev commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

This refactor has been carried out based on the assumption that any new external table formats will strictly conform to ColumnarTableBase. Otherwise common functions like getNumScanMorsels and createScanState would need to be moved to NodeTable / RelTable classes

@adsharma

Copy link
Copy Markdown
Contributor

Flaky github infra. Rerunning makes the tests pass.

@adsharma

Copy link
Copy Markdown
Contributor

How about Generalize instead of Standardize in the PR summary? A few comments about interface design. It has evolved organically and this is probably the first time we're being intentional about it.

On node_group_idx_t for getNumScanMorsels

It's a semantic mismatch, though it "works" numerically. node_group_idx_t is uint64_t (types.h:55), so it compiles and conveniently slots into the existing numCommittedNodeGroups field. But the type name lies about what's being counted:

  • IceDisk (ice_disk_node_table.cpp:104): returns parquetReader->getNumRowGroups() — here a "morsel" is a row group, which in Kùzu-style systems often maps to a node group. Defensible.
  • Arrow (arrow_node_table.cpp:160-168): returns Σ ceil(batchLength / scanMorselSize) — these are genuinely morsels, not node groups. Not defensible to call them node_group_idx_t.

The function is named getNumScanMorsels (good) but typed node_group_idx_t (bad) and its result is stored into numCommittedNodeGroups (scan_node_table.cpp:57, a field that is now misnamed for the columnar path — it holds morsel counts, not node groups). So the type and the consuming field both pre-date the generalization and weren't updated to match the new concept.

Recommendation: return a neutral uint64_t (or introduce a small typedef like morsel_count_t), and either rename the shared-state field that receives it or add a clearly-named alias. Keeping node_group_idx_t propagates a name that only fits one of the three formats into the common base.

On passing memoryManager to createScanState()

No — it's redundant. Table already stores MemoryManager* memoryManager as a protected member (table.h:219), and ColumnarNodeTableBase inherits it. Every implementation just forwards the parameter straight into the scan-state constructor:

// arrow_node_table.cpp:53-57 — memoryManager only ever passed through
return std::make_unique<ArrowNodeTableScanState>(*memoryManager, nodeIDVector, outVectors,
    nodeIDVector->state);

Same in ice_disk_node_table.cpp:43, arrow_rel_table.cpp:41, ice_disk_rel_table.cpp:25. The table object already has the only MemoryManager that makes sense to use, so the parameter adds a caller burden and a chance to pass the wrong one.

Recommendation: drop the memoryManager parameter; implementations use this->memoryManager. This also removes the inconsistency where the rel signature takes an extra outChunkState the node signature doesn't (see below).

Other observations

1. The operators still type-switch on the base class instead of dispatching virtually. This is the biggest interface smell. scan_node_table.cpp does dynamic_cast<ColumnarNodeTableBase*>(table) in three separate places (createNodeTableScanState :20, initialize :56, nextMorsel :75), and scan_rel_table.cpp/scan_multi_rel_tables.cpp repeat the pattern. The PR built a base class but then kept the external branching. Contrast with initializeScanCoordination, which is a virtual on the base and is called unconditionally (scan_node_table.cpp:54) — that's the pattern done right. createScanState and getNumScanMorsels should follow it: add them as virtuals on NodeTable/Table with default implementations for native tables, eliminating the casts. Right now the design is half-polymorphic.

2. Returning unique_ptr<TableScanState> and immediately downcasting at the call site. Both rel callers downcast the returned base pointer:

// scan_rel_table.cpp:95-96, scan_multi_rel_tables.cpp:89-90
scanState = std::unique_ptr<RelTableScanState>(
    dynamic_cast<RelTableScanState*>(tableScanState.release()));

This signals the return type is too abstract. ColumnarRelTableBase::createScanState should return unique_ptr<RelTableScanState>; ColumnarNodeTableBase::createScanState should return unique_ptr<NodeTableScanState> (or ColumnarNodeTableScanState). The dynamic_cast here is purely working around an over-broad return type.

3. Asymmetric signatures between the node and rel base. Node: createScanState(nodeIDVector, outVectors, mm). Rel: createScanState(nodeIDVector, outVectors, mm, outChunkState). The node version secretly hardcodes nodeIDVector->state as the outChunkState; the rel version needs nbrNodeIDVector->state. If you take outChunkState as an explicit param on both (or derive it consistently), the two bases mirror each other — desirable for a "common columnar base." Right now a reader has to notice the hidden assumption.

4. The two base classes aren't symmetric in their interface. ColumnarNodeTableBase exposes getNumScanMorsels + createScanState; ColumnarRelTableBase exposes only createScanState. That's defensible (rel scans are bound-node-driven, not full-table morsel scans), but it's worth a one-line comment on the rel base explaining why getNumScanMorsels is absent, so a future lance/parquet author doesn't assume the omission is a bug.

5. getNumScanMorsels(transaction) hides wildly different costs. Arrow's is an in-memory arithmetic loop; IceDisk's opens the parquet file and constructs a throwaway ParquetReader every call (ice_disk_node_table.cpp:103) — and getTotalRowCount does it again (:272). The interface looks like a cheap accessor but is I/O-bound for IceDisk. Since it's called during initializeScanCoordination and again from the shared-state init, consider caching the row-group count once (the reader is already constructed per scan-state anyway). Not an interface-shape issue per se, but the const-accessor framing sets the wrong expectation.

6. ColumnarNodeTableScanState takes MemoryManager& mm marked [[maybe_unused]] (columnar_node_table_base.h:21). The [[maybe_unused]] is the tell that the base constructor shouldn't take it — either store it (the subclasses that need it can reach it) or drop it from the base signature. As-is it's dead weight that every subclass forwards.

7. createScanState is const but builds state holding mutable resources (e.g. IceDiskNodeTableScanState owns a ParquetReader that performs writes/caching). Fine in practice, but slightly surprising for a const method; worth a comment if intentional.

Summary of the highest-value changes

  1. Drop memoryManager from createScanState (redundant with Table::memoryManager).
  2. Replace node_group_idx_t with a neutral count type on getNumScanMorsels, and fix the numCommittedNodeGroups field naming on the columnar path.
  3. Push createScanState/getNumScanMorsels down as virtuals with native defaults so the scan operators stop dynamic_cast-ing — that's what completes the generalization the PR is aiming for.
  4. Narrow the return types (RelTableScanState / NodeTableScanState) to kill the call-site downcasts.

@aheev aheev changed the title standardize table storage and scans Generalize table storage and scans Jun 24, 2026
@aheev

aheev commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

This refactor has been carried out based on the assumption that any new external table formats will strictly conform to ColumnarTableBase. Otherwise common functions like getNumScanMorsels and createScanState would need to be moved to NodeTable / RelTable classes

The operators still type-switch on the base class instead of dispatching virtually was my concern here. It will break the ABI. Do you still want me to proceed?

@aheev

aheev commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

On node_group_idx_t for getNumScanMorsels

It's a semantic mismatch, though it "works" numerically. node_group_idx_t is uint64_t (types.h:55), so it compiles and conveniently slots into the existing numCommittedNodeGroups field. But the type name lies about what's being counted:

  • IceDisk (ice_disk_node_table.cpp:104): returns parquetReader->getNumRowGroups() — here a "morsel" is a row group, which in Kùzu-style systems often maps to a node group. Defensible.
  • Arrow (arrow_node_table.cpp:160-168): returns Σ ceil(batchLength / scanMorselSize) — these are genuinely morsels, not node groups. Not defensible to call them node_group_idx_t.

The function is named getNumScanMorsels (good) but typed node_group_idx_t (bad) and its result is stored into numCommittedNodeGroups (scan_node_table.cpp:57, a field that is now misnamed for the columnar path — it holds morsel counts, not node groups). So the type and the consuming field both pre-date the generalization and weren't updated to match the new concept.

Recommendation: return a neutral uint64_t (or introduce a small typedef like morsel_count_t), and either rename the shared-state field that receives it or add a clearly-named alias. Keeping node_group_idx_t propagates a name that only fits one of the three formats into the common base.

ColumnarTables use NodeTable's functions which expect node_group_idx_t

Other observations

1. The operators still type-switch on the base class instead of dispatching virtually. This is the biggest interface smell. scan_node_table.cpp does dynamic_cast<ColumnarNodeTableBase*>(table) in three separate places (createNodeTableScanState :20, initialize :56, nextMorsel :75), and scan_rel_table.cpp/scan_multi_rel_tables.cpp repeat the pattern. The PR built a base class but then kept the external branching. Contrast with initializeScanCoordination, which is a virtual on the base and is called unconditionally (scan_node_table.cpp:54) — that's the pattern done right. createScanState and getNumScanMorsels should follow it: add them as virtuals on NodeTable/Table with default implementations for native tables, eliminating the casts. Right now the design is half-polymorphic.

#614 (comment)

2. Returning unique_ptr<TableScanState> and immediately downcasting at the call site. Both rel callers downcast the returned base pointer:

// scan_rel_table.cpp:95-96, scan_multi_rel_tables.cpp:89-90
scanState = std::unique_ptr<RelTableScanState>(
    dynamic_cast<RelTableScanState*>(tableScanState.release()));

This signals the return type is too abstract. ColumnarRelTableBase::createScanState should return unique_ptr<RelTableScanState>; ColumnarNodeTableBase::createScanState should return unique_ptr<NodeTableScanState> (or ColumnarNodeTableScanState). The dynamic_cast here is purely working around an over-broad return type.

The motivation behind returning a TableScanState is scanState is used in functions such as scanInternal, initScanState, etc which expect a TableScanState

3. Asymmetric signatures between the node and rel base. Node: createScanState(nodeIDVector, outVectors, mm). Rel: createScanState(nodeIDVector, outVectors, mm, outChunkState). The node version secretly hardcodes nodeIDVector->state as the outChunkState; the rel version needs nbrNodeIDVector->state. If you take outChunkState as an explicit param on both (or derive it consistently), the two bases mirror each other — desirable for a "common columnar base." Right now a reader has to notice the hidden assumption.

Node and rel tables are meant to be different. Moreover we don't have a common ColumnarTableBase

4. The two base classes aren't symmetric in their interface. ColumnarNodeTableBase exposes getNumScanMorsels + createScanState; ColumnarRelTableBase exposes only createScanState. That's defensible (rel scans are bound-node-driven, not full-table morsel scans), but it's worth a one-line comment on the rel base explaining why getNumScanMorsels is absent, so a future lance/parquet author doesn't assume the omission is a bug.

could be done as part of #614 (comment)

7. createScanState is const but builds state holding mutable resources (e.g. IceDiskNodeTableScanState owns a ParquetReader that performs writes/caching). Fine in practice, but slightly surprising for a const method; worth a comment if intentional.

It doesn't modify members of the class 🤔

@adsharma

Copy link
Copy Markdown
Contributor

The larger question is:

Table  
  └── NodeTable  
        └── ColumnarNodeTableBase  
              ├── IceDiskNodeTable  
              └── ArrowNodeTable  

vs

Table  
  └── ColumnarNodeTableBase (base)  
        ├── NodeTable (specializes for node groups)  
                   └── IceDiskNodeTable
                               └── Parquet Based
                                └── Foreign (subclassed in an extension)
        └── ArrowNodeTable (uses Arrow batches)  

We arrived at the current design with the idea that NodeTable contains a lot of well tested code. Let's not break anything and do the least disruptive thing. That's the reason for some of the interface smell we have.

But you bring up a good point about breaking ABI. Is it the extension ABI you're talking about or something else?

@aheev

aheev commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

The larger question is:

Table  
  └── NodeTable  
        └── ColumnarNodeTableBase  
              ├── IceDiskNodeTable  
              └── ArrowNodeTable  

vs

Table  
  └── ColumnarNodeTableBase (base)  
        ├── NodeTable (specializes for node groups)  
                   └── IceDiskNodeTable
                               └── Parquet Based
                                └── Foreign (subclassed in an extension)
        └── ArrowNodeTable (uses Arrow batches)  

We arrived at the current design with the idea that NodeTable contains a lot of well tested code. Let's not break anything and do the least disruptive thing. That's the reason for some of the interface smell we have.

But you bring up a good point about breaking ABI. Is it the extension ABI you're talking about or something else?

I will get back to it by tmrw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants