Skip to content

Columnar shuffle with nested types is slower than Spark #2904

@andygrove

Description

@andygrove

Describe the bug

Based on the new benchmarks in #2902, it seems that Comet columnar shuffle with complex types is extremely slow.

Shuffle with nested schema (maxDepth=2, partitionNum=201):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-----------------------------------------------------------------------------------------------------------------------------------------
Spark                                                                 85             93           4          0.0       84642.1       1.0X
Comet (Spark Shuffle)                                                 84             92          11          0.0       83718.9       1.0X
Comet (jvm Shuffle)                                                  836            838           3          0.0      836109.3       0.1X
Comet (native Shuffle)                                                83             88           6          0.0       82666.6       1.0X

Steps to reproduce

SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometShuffleBenchmark

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions