Support external writes to Iceberg tables using Postgres catalog#247
Draft
sfc-gh-mslot wants to merge 16 commits into
Draft
Support external writes to Iceberg tables using Postgres catalog#247sfc-gh-mslot wants to merge 16 commits into
sfc-gh-mslot wants to merge 16 commits into
Conversation
sfc-gh-mslot
commented
Mar 4, 2026
| * Queue old files that are no longer referenced for deletion. Compare | ||
| * oldFileHash with newFileHash to find unreferenced files. | ||
| */ | ||
| if (oldFileHash != NULL && newFileHash != NULL) |
Collaborator
Author
There was a problem hiding this comment.
this looks questionable, since we need to consider which snapshots are still held in the new metadata
Implements functionality to allow external Iceberg clients (Spark, PyIceberg) to write data and metadata, then update the pg_lake catalog by modifying iceberg_tables.metadata_location. Key changes: - Modified external_catalog_modification trigger to handle UPDATE on internal catalog tables, enabling external writes with optimistic concurrency control via previous_metadata_location - Added sync_iceberg_metadata_from_external_write() function to sync pg_lake catalog state (schema, partition specs, data files) from externally-written metadata - Introduced SkipIcebergDDLProcessing flag to prevent duplicate field_id registration when syncing external schema changes - Made AddDataFileToTable() non-static for use by sync logic - Added comprehensive test suite covering basic writes, schema changes, and concurrency control External clients can now: 1. Write new data/metadata files to object storage 2. UPDATE iceberg_tables SET metadata_location = <new>, previous_metadata_location = <old> 3. pg_lake automatically syncs internal catalog from the new metadata Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
6468828 to
c8b21a1
Compare
When external Iceberg clients write new metadata, old data files that are no longer referenced need to be queued for deletion, similar to how compaction handles file cleanup. Key changes: - Track old file paths before clearing the catalog during sync - Track new file paths as we repopulate from external metadata - Compare old vs new file sets to identify unreferenced files - Queue unreferenced files for deletion using InsertDeletionQueueRecord - Handle empty table case where all old files become unreferenced This ensures external writes properly clean up replaced files through the deletion queue, which VACUUM will eventually remove from object storage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Use GetTransactionSnapshot() instead of GetActiveSnapshot() when retrieving old data files for deletion queue processing. GetActiveSnapshot() can fail if called outside of a snapshot context, which was causing server crashes when the sync function was called from the trigger. GetTransactionSnapshot() is safer as it creates a snapshot if one doesn't exist. This fixes server crashes during external writes from PyIceberg/Spark. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Call sync_iceberg_metadata_from_external_write directly as a C function instead of via SPI_EXECUTE to avoid nested SPI connections. The sync function uses SPI internally (via GetTableDataFilesFromCatalog and other catalog functions), so calling it through SPI_EXECUTE creates nested SPI calls which can cause crashes or undefined behavior. Using DirectFunctionCall1 instead allows proper SPI nesting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Fixed two critical issues: 1. Hash table creation: Added HASH_STRINGS flag to hash_create calls in sync_external_metadata.c. Hash tables using string keys must specify HASH_STRINGS or they will fail with assertion errors. 2. Cross-extension function calls: Use OidFunctionCall1 with LookupFuncName to call sync_iceberg_metadata_from_external_write from pg_lake_iceberg. This avoids undefined symbol errors since the function is in a different shared library (pg_lake_table.so). All 6 external write tests now pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Added 3 comprehensive tests to verify that unreferenced data files are properly queued for deletion when external Iceberg clients write new metadata: 1. test_external_write_deletion_queue_on_overwrite: Verifies that when PyIceberg overwrites a table, all old data files are added to the deletion queue with proper orphaned_at timestamps. 2. test_external_write_deletion_queue_on_empty: Verifies that when PyIceberg overwrites a table with empty data, ALL old data files are queued for deletion. 3. test_external_write_deletion_queue_only_old_files: Verifies that only old files (not new files) are queued, and that append operations don't queue any files for deletion. All tests query lake_engine.deletion_queue to confirm proper file lifecycle management. Tests verify: - Correct file paths are queued - orphaned_at timestamps are set - New files are NOT queued - Append operations don't trigger deletions All 9 external write tests now pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Changed LookupFuncName call to use a separate argTypes array variable
instead of an inline compound literal (Oid[]){REGCLASSOID}.
This improves code readability and makes the indentation tool happy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Include utils/snapmgr.h to declare GetTransactionSnapshot() function. This fixes the CI build error: implicit declaration of function 'GetTransactionSnapshot' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
Fixed two issues with the external_catalog_modification trigger: 1. Changed error messages to match test expectations: - "modifying the internal catalog is currently only supported via pg_lake_iceberg tables" - This matches the expected error format in test_iceberg_catalog.py 2. Added validation to detect catalog_name changes during UPDATE: - Prevents changing a table from external catalog to internal catalog - Prevents changing a table from internal catalog to external catalog - These operations should only be done via CREATE/DROP TABLE Fixes CI test failures in: - test_create_in_internal_catalog - test_create_in_external_catalog All external write tests still pass (9/9). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Marco Slot <marco.slot@snowflake.com>
GetIcebergCatalogMetadataLocation is static in catalog.c; the trigger must use the exported sibling, which dispatches to the same internal helper for same-database internal-catalog UPDATEs. Signed-off-by: Marco Slot <marco.slot@snowflake.com>
The sync_iceberg_metadata_from_external_write CREATE FUNCTION was landed in pg_lake_table--3.2--3.3.sql, but 3.3 has already shipped (latest tag is v3.3.4). 3.4 is unreleased so its upgrade script is fair game; move the CREATE FUNCTION there. Signed-off-by: Marco Slot <marco.slot@snowflake.com>
SyncDataFilesFromMetadata used to walk only the current snapshot's manifests and queue every other previously-cataloged file for deletion. Iceberg metadata typically retains historic snapshots, and external clients may rely on them for time-travel reads — physically deleting those files (which the deletion queue does after a 10-day retention window) would corrupt those reads. Instead, build the set of "still referenced" files from ALL retained snapshots in the new metadata via IcebergFindAllReferencedFiles, and queue old cataloged files only when they're absent from that set. The catalog itself is still rebuilt from the current snapshot only, since pg_lake doesn't expose time-travel reads — only the deletion-queue half of the sync changes. Signed-off-by: Marco Slot <marco.slot@snowflake.com>
ProcessAlterTable already short-circuits when SkipIcebergDDLProcessing is set, but rename goes through a different post-process path. Without this guard, the external-write sync's RENAME COLUMN would trigger a duplicate metadata write on top of the externally-supplied metadata. Signed-off-by: Marco Slot <marco.slot@snowflake.com>
…eue tests Three new tests: - test_external_write_preserves_history_files: PyIceberg append after pg_lake insert. Old data files must NOT be queued for deletion (still referenced by the retained pre-append snapshot). - test_external_write_partition_spec_evolution: PyIceberg adds an identity partition; new spec must register and reads must work across both specs. - test_external_write_rename_column: PyIceberg renames a column; field IDs are stable so the sync must rename in pg_class, not silently drop+add. The three deletion-queue tests are updated to assert the corrected semantic: overwrite/empty-overwrite/append+overwrite all retain the prior snapshot(s), so files stay referenced and are NOT queued for deletion. (Physical-deletion-of-orphans coverage requires snapshot expiration, which pyiceberg's Python API does not expose; that path is exercised in pg_lake's internal tests already.) Signed-off-by: Marco Slot <marco.slot@snowflake.com>
c8b21a1 to
0958ea0
Compare
CI's lint-check-18 caught comment-reflow and continuation-indent issues in the trigger handler, the sync, and the new pytest. Run `make reindent` to bring them into line with the project's pgindent/black formatting. No behavior change. Signed-off-by: Marco Slot <marco.slot@snowflake.com>
…rnal-write tests Adds 8 tests around the external-write path: INSERT/DELETE/catalog-flip/ unknown-table rejection in the iceberg_tables trigger, column stats populated by sync, PyIceberg delete() through the position-deletes branch, list-typed columns surviving the data-file resync, and a schema-then-data sequence on the same table.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background project I've had going on, needs a deeper look.
We expose Iceberg tables via the
iceberg_tablesview, which is supported by Spark and pyiceberg. However, we currently only support reads from that view. This PR adds support for writes to existing tables (i.e. updates on the view), by reading the metadata file being inserted, updating our internal metadata tables from it, and adjusting the schema if needed.