[lake/hudi] Introduce fluss-lake-hudi module and HudiLakeStorage#3256
[lake/hudi] Introduce fluss-lake-hudi module and HudiLakeStorage#3256fhan688 wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an initial Apache Hudi lake-storage plugin module (fluss-lake-hudi) and wires it into the Fluss build and distribution so that “hudi” can be selected as a lake format and the plugin can be discovered via ServiceLoader.
Changes:
- Adds
HUDI("hudi")toDataLakeFormatand introduces a newfluss-lake-hudimodule with stubHudiLakeStorage+HudiLakeStoragePlugin. - Wires the new module into Maven reactor (
fluss-lake/pom.xml), dist plugin assembly (fluss-dist), and quickstart build preparation. - Adds a Hudi Flink bundle dependency (provided) and a new
hudi.versionMaven property.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Adds hudi.version property for dependency management. |
| fluss-lake/pom.xml | Adds fluss-lake-hudi to the lake parent reactor modules. |
| fluss-lake/fluss-lake-hudi/pom.xml | New module POM for the Hudi lake plugin. |
| fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/HudiLakeStorage.java | Adds initial (stub) LakeStorage implementation for Hudi. |
| fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/HudiLakeStoragePlugin.java | Adds LakeStoragePlugin implementation for ServiceLoader discovery. |
| fluss-lake/fluss-lake-hudi/src/main/resources/META-INF/services/org.apache.fluss.lake.lakestorage.LakeStoragePlugin | Registers HudiLakeStoragePlugin via ServiceLoader metadata. |
| fluss-lake/fluss-lake-hudi/src/main/resources/META-INF/NOTICE | Adds NOTICE file for the new module. |
| fluss-lake/fluss-lake-hudi/src/test/resources/log4j2-test.properties | Adds test logging configuration for the new module. |
| fluss-lake/fluss-lake-hudi/src/test/resources/org.junit.jupiter.api.extension.Extension | Adds JUnit extension auto-registration for tests. |
| fluss-flink/fluss-flink-common/pom.xml | Adds provided Hudi Flink bundle dependency. |
| fluss-dist/src/main/assemblies/plugins.xml | Copies the built fluss-lake-hudi jar into plugins/hudi/ in the dist. |
| fluss-dist/pom.xml | Adds fluss-lake-hudi as a (provided) dependency for build ordering/wiring. |
| fluss-common/src/main/java/org/apache/fluss/metadata/DataLakeFormat.java | Adds HUDI to supported DataLakeFormat enum values. |
| docker/quickstart-flink/prepare_build.sh | Includes Hudi plugin build output directory in quickstart build preparation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ording to AI reviewed suggestions.
…https://github.com/fhan688/fluss into Introduce-fluss-lake-hudi-module-and-HudiLakeStorage
| /** Implementation of Hudi lake storage. */ | ||
| public class HudiLakeStorage implements LakeStorage { | ||
|
|
||
| protected final Configuration hudiConfig; |
There was a problem hiding this comment.
nit (visibility): Please make this private to match IcebergLakeStorage#icebergConfig and PaimonLakeStorage#paimonConfig. There's no subclass that needs protected access.
| import org.apache.fluss.lake.writer.LakeTieringFactory; | ||
| import org.apache.fluss.metadata.TablePath; | ||
|
|
||
| /** Implementation of Hudi lake storage. */ |
| <scope>provided</scope> | ||
| </dependency> | ||
|
|
||
| <!-- Hudi --> |
There was a problem hiding this comment.
Looking at the existing modules:
- Paimon is
import-ed directly inLakeFlinkCatalog/LakeTableFactory, so it requires a pom dep. - Iceberg is loaded via
Class.forName("org.apache.iceberg.flink.FlinkDynamicTableFactory")inLakeTableFactory#getIcebergFactory(). There is no Iceberg dep influss-flink-common/pom.xml—fluss-flink-commonstays decoupled from Iceberg, and the actual jars come fromplugins/iceberg/*at runtime. - Hudi in this PR has no Java references at all (
git grep "org.apache.hudi" -- '*.java'returns 0 hits), yet adds a hard pom dep tohudi-flink1.20-bundle— a shaded uber-bundle that re-bundles hadoop/parquet/avro/jackson/guava. This is the worst of both worlds: dead dependency + classpath pollution for every downstream that consumesfluss-flink-common.
Suggestion: When real Hudi-Flink integration lands, follow the Iceberg pattern (Class.forName) and keep concrete Hudi artifacts only in fluss-lake-hudi/pom.xml.
…e style in HudiLakeStorage.
|
+1 |
luoyuxia
left a comment
There was a problem hiding this comment.
@fhan688 Thanks for the pr @XuQianJin-Stars Thanks for reviewing. LGTM
|
@fhan688 the ci fails |
Purpose
Linked issue: #3258
Introduce an initial Hudi LakeStorage plugin module for Fluss and wire it into build/distribution, so Hudi can be recognized as a supported data lake format.
Brief change log
Tests
This commit mainly introduces module/plugin scaffolding and build wiring.
Suggested verification:
mvn -pl fluss-lake/fluss-lake-hudi -am clean test
mvn -pl fluss-dist -am clean package -DskipTests
API and Format
Documentation