Skip to content

Conversation

@nikolay-voskresenskiy-db
Copy link

@nikolay-voskresenskiy-db nikolay-voskresenskiy-db commented Oct 8, 2025

Thank you for submitting a pull request!

Please make sure you have signed our Contributor License Agreement (CLA).
We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code.
You only need to sign the CLA once.

Merge in PDS-GTB/elasticsearch-hadoop from spark-35 to db-feature/spark-35

* commit 'fc4f33b6c3d609f18b820b25cc1435a2c4c5ead8':
  Spark 3.5 support
@cla-checker-service
Copy link

cla-checker-service bot commented Oct 8, 2025

💚 CLA has been signed

@nikolay-voskresenskiy-db
Copy link
Author

A few notes on the implementation:

  1. The actual change that makes the runtime code compatible with spark 3.5.x is a single line in EsStreamQueryWriter.scala. There is no functional change, only a signature change inside internal spark API, The rest of the changes are related to building and testing:
  2. We made the artifact for 3.5 into a separate build variant to prove out that the same code is compatible with both 3.4 and 3.5 via the test suite. This has a negative impact of prolonging build time, so the variant for 3.4 could be removed in future and 3.5 left as the primary variant
  3. StreamingQueryLifecycleListener helper class in the tests now needs two versions because spark 3.5 splits out the stream idleness notification logic into a separate method
  4. 3.5 has a slight change in dependencies since spark gets a separate spark-sql-api module
  5. Had to make a few fixes to the build scaffolding to correctly handle 4 build variants instead of 2

@nikolay-voskresenskiy-db nikolay-voskresenskiy-db changed the title Db contrib/spark 35 Spark 3.5 support Oct 8, 2025
@masseyke
Copy link
Member

Hi @nikolay-voskresenskiy-db. Thanks for the PR! It will be a little while before someone has the bandwidth to review this.

@Kimahriman
Copy link

If you make one more update it also will be compatible with Spark 4.0

https://github.com/elastic/elasticsearch-hadoop/blob/main/spark/sql-30/src/main/scala/org/elasticsearch/spark/sql/EsSparkSQL.scala#L92

sqlContext -> sparkSession

@nikolay-voskresenskiy-db
Copy link
Author

I did try to build for spark 4.0 but it requires java 17 and there are multiple places in the project which are still on java11. Might revisit that in future.

@Kimahriman
Copy link

I did try to build for spark 4.0 but it requires java 17 and there are multiple places in the project which are still on java11. Might revisit that in future.

Yeah I've also tried to go that route and had to just build a custom JAR to use after I finally hacked my way to getting it working. The main issue is the "runtime" java that is forcing java 8 actually. Though I got another variant added pretty simply to your MR that seems to work, the only thing I had to do was disable javadoc for the 4.0 variant because that's also hard coded to use Java 8 javadoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants