Skip to content

[flink] Lake-batch fallback should use KV snapshot instead of reading log from earliest #3327

@fresh-borzoni

Description

@fresh-borzoni

Search before asking

  • I searched in the issues and found nothing similar.

Description

Follow-up to #3296.

In batch mode on a lake-enabled PK table with no lake snapshot yet, the fallback reads every bucket's log from EARLIEST, ignoring existing Fluss KV snapshots. This is OOM-prone on large never-tiered tables.

Spark fixed the equivalent in #3317 with per-bucket dispatch (snapshot+tail where KV snapshot exists, log-only otherwise).

Flink can't 1:1 port because FlinkSourceSplitReader has no sort-merge on the Fluss-only path. We may reuse LakeSnapshotAndLogSplitScanner (already used by Flink's lake reader, already sort-merges per PK) for the fallback or do smth similar

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions