Skip to content

Commit 7d66bc0

Browse files
authored
[opt](nereids) adjust Left join cost factor:probeShortcutFactor (#60183)
### What problem does this PR solve? probeShortcutFactor is not reasonable when right row count is much more than left row count. tpch affected query(1T) before:22.sql,49.008,2.729,2.582,2.666,2.582 after: 22.sql,1.466,1.439,1.384,1.379,1.379 tpcds affected queries(1T) before: query33.sql,3.252,0.490,0.454,0.412,0.412 query35.sql,12.897,1.059,1.054,1.039,1.039 query64.sql,139.298,2.788,20.533,8.234,2.788 query69.sql,3.004,0.835,0.773,0.772,0.772 after: query33.sql,1.114,0.512,0.540,0.468,0.468 query35.sql,1.398,1.146,1.194,1.180,1.146 query64.sql,3.562,1.939,2.054,1.985,1.939 query69.sql,0.826,0.848,0.806,0.804,0.804
1 parent 8356a36 commit 7d66bc0

17 files changed

Lines changed: 434 additions & 376 deletions

File tree

fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostModel.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,8 @@ public Cost visitPhysicalHashJoin(
452452
);
453453
}
454454
double probeShortcutFactor = 1.0;
455-
if (ConnectContext.get() != null && ConnectContext.get().getStatementContext() != null
455+
if (rightRowCount < 10 * leftRowCount
456+
&& ConnectContext.get() != null && ConnectContext.get().getStatementContext() != null
456457
&& !ConnectContext.get().getStatementContext().isHasUnknownColStats()
457458
&& physicalHashJoin.getJoinType().isLeftSemiOrAntiJoin()
458459
&& physicalHashJoin.getOtherJoinConjuncts().isEmpty()

regression-test/data/shape_check/tpcds_sf100/constraints/query23.out

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,29 +53,29 @@ PhysicalCteAnchor ( cteId=CTEId#0 )
5353
--------------hashAgg[LOCAL]
5454
----------------PhysicalUnion
5555
------------------PhysicalProject
56-
--------------------hashJoin[LEFT_SEMI_JOIN shuffle] hashCondition=((catalog_sales.cs_item_sk = frequent_ss_items.item_sk)) otherCondition=() build RFs:RF5 item_sk->[cs_item_sk]
56+
--------------------hashJoin[RIGHT_SEMI_JOIN shuffle] hashCondition=((catalog_sales.cs_item_sk = frequent_ss_items.item_sk)) otherCondition=() build RFs:RF5 cs_item_sk->[item_sk]
57+
----------------------PhysicalCteConsumer ( cteId=CTEId#0 ) apply RFs: RF5
5758
----------------------PhysicalProject
5859
------------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((catalog_sales.cs_bill_customer_sk = best_ss_customer.c_customer_sk)) otherCondition=() build RFs:RF4 c_customer_sk->[cs_bill_customer_sk]
5960
--------------------------PhysicalProject
6061
----------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((catalog_sales.cs_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF3 d_date_sk->[cs_sold_date_sk]
6162
------------------------------PhysicalProject
62-
--------------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF3 RF4 RF5
63+
--------------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF3 RF4
6364
------------------------------PhysicalProject
6465
--------------------------------filter((date_dim.d_moy = 5) and (date_dim.d_year = 2000))
6566
----------------------------------PhysicalOlapScan[date_dim]
6667
--------------------------PhysicalCteConsumer ( cteId=CTEId#2 )
67-
----------------------PhysicalCteConsumer ( cteId=CTEId#0 )
6868
------------------PhysicalProject
69-
--------------------hashJoin[LEFT_SEMI_JOIN shuffle] hashCondition=((web_sales.ws_item_sk = frequent_ss_items.item_sk)) otherCondition=() build RFs:RF8 item_sk->[ws_item_sk]
69+
--------------------hashJoin[RIGHT_SEMI_JOIN shuffle] hashCondition=((web_sales.ws_item_sk = frequent_ss_items.item_sk)) otherCondition=() build RFs:RF8 ws_item_sk->[item_sk]
70+
----------------------PhysicalCteConsumer ( cteId=CTEId#0 ) apply RFs: RF8
7071
----------------------PhysicalProject
7172
------------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((web_sales.ws_bill_customer_sk = best_ss_customer.c_customer_sk)) otherCondition=() build RFs:RF7 c_customer_sk->[ws_bill_customer_sk]
7273
--------------------------PhysicalProject
7374
----------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((web_sales.ws_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF6 d_date_sk->[ws_sold_date_sk]
7475
------------------------------PhysicalProject
75-
--------------------------------PhysicalOlapScan[web_sales] apply RFs: RF6 RF7 RF8
76+
--------------------------------PhysicalOlapScan[web_sales] apply RFs: RF6 RF7
7677
------------------------------PhysicalProject
7778
--------------------------------filter((date_dim.d_moy = 5) and (date_dim.d_year = 2000))
7879
----------------------------------PhysicalOlapScan[date_dim]
7980
--------------------------PhysicalCteConsumer ( cteId=CTEId#2 )
80-
----------------------PhysicalCteConsumer ( cteId=CTEId#0 )
8181

regression-test/data/shape_check/tpcds_sf100/rf_prune/query33.out

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,62 +9,65 @@ PhysicalResultSink
99
------------hashAgg[LOCAL]
1010
--------------PhysicalUnion
1111
----------------PhysicalProject
12-
------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF3 i_manufact_id->[i_manufact_id]
12+
------------------hashJoin[RIGHT_SEMI_JOIN shuffleBucket] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF3 i_manufact_id->[i_manufact_id]
13+
--------------------PhysicalProject
14+
----------------------filter((item.i_category = 'Home'))
15+
------------------------PhysicalOlapScan[item] apply RFs: RF3
1316
--------------------hashAgg[GLOBAL]
1417
----------------------PhysicalDistribute[DistributionSpecHash]
1518
------------------------hashAgg[LOCAL]
1619
--------------------------PhysicalProject
17-
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((store_sales.ss_item_sk = item.i_item_sk)) otherCondition=() build RFs:RF2 i_item_sk->[ss_item_sk]
20+
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((store_sales.ss_item_sk = item.i_item_sk)) otherCondition=()
1821
------------------------------PhysicalProject
1922
--------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((store_sales.ss_addr_sk = customer_address.ca_address_sk)) otherCondition=() build RFs:RF1 ca_address_sk->[ss_addr_sk]
2023
----------------------------------PhysicalProject
2124
------------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((store_sales.ss_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF0 d_date_sk->[ss_sold_date_sk]
2225
--------------------------------------PhysicalProject
23-
----------------------------------------PhysicalOlapScan[store_sales] apply RFs: RF0 RF1 RF2
26+
----------------------------------------PhysicalOlapScan[store_sales] apply RFs: RF0 RF1
2427
--------------------------------------PhysicalProject
2528
----------------------------------------filter((date_dim.d_moy = 1) and (date_dim.d_year = 2002))
2629
------------------------------------------PhysicalOlapScan[date_dim]
2730
----------------------------------PhysicalProject
2831
------------------------------------filter((customer_address.ca_gmt_offset = -5.00))
2932
--------------------------------------PhysicalOlapScan[customer_address]
3033
------------------------------PhysicalProject
31-
--------------------------------PhysicalOlapScan[item] apply RFs: RF3
34+
--------------------------------PhysicalOlapScan[item]
35+
----------------PhysicalProject
36+
------------------hashJoin[RIGHT_SEMI_JOIN shuffleBucket] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF7 i_manufact_id->[i_manufact_id]
3237
--------------------PhysicalProject
3338
----------------------filter((item.i_category = 'Home'))
34-
------------------------PhysicalOlapScan[item]
35-
----------------PhysicalProject
36-
------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF7 i_manufact_id->[i_manufact_id]
39+
------------------------PhysicalOlapScan[item] apply RFs: RF7
3740
--------------------hashAgg[GLOBAL]
3841
----------------------PhysicalDistribute[DistributionSpecHash]
3942
------------------------hashAgg[LOCAL]
4043
--------------------------PhysicalProject
41-
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((catalog_sales.cs_item_sk = item.i_item_sk)) otherCondition=() build RFs:RF6 i_item_sk->[cs_item_sk]
44+
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((catalog_sales.cs_item_sk = item.i_item_sk)) otherCondition=()
4245
------------------------------PhysicalProject
4346
--------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((catalog_sales.cs_bill_addr_sk = customer_address.ca_address_sk)) otherCondition=() build RFs:RF5 ca_address_sk->[cs_bill_addr_sk]
4447
----------------------------------PhysicalProject
4548
------------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((catalog_sales.cs_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF4 d_date_sk->[cs_sold_date_sk]
4649
--------------------------------------PhysicalProject
47-
----------------------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF4 RF5 RF6
50+
----------------------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF4 RF5
4851
--------------------------------------PhysicalProject
4952
----------------------------------------filter((date_dim.d_moy = 1) and (date_dim.d_year = 2002))
5053
------------------------------------------PhysicalOlapScan[date_dim]
5154
----------------------------------PhysicalProject
5255
------------------------------------filter((customer_address.ca_gmt_offset = -5.00))
5356
--------------------------------------PhysicalOlapScan[customer_address]
5457
------------------------------PhysicalProject
55-
--------------------------------PhysicalOlapScan[item] apply RFs: RF7
58+
--------------------------------PhysicalOlapScan[item]
59+
----------------PhysicalProject
60+
------------------hashJoin[RIGHT_SEMI_JOIN shuffleBucket] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF11 i_manufact_id->[i_manufact_id]
5661
--------------------PhysicalProject
5762
----------------------filter((item.i_category = 'Home'))
58-
------------------------PhysicalOlapScan[item]
59-
----------------PhysicalProject
60-
------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((item.i_manufact_id = item.i_manufact_id)) otherCondition=() build RFs:RF11 i_manufact_id->[i_manufact_id]
63+
------------------------PhysicalOlapScan[item] apply RFs: RF11
6164
--------------------hashAgg[GLOBAL]
6265
----------------------PhysicalDistribute[DistributionSpecHash]
6366
------------------------hashAgg[LOCAL]
6467
--------------------------PhysicalProject
6568
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((web_sales.ws_item_sk = item.i_item_sk)) otherCondition=() build RFs:RF10 ws_item_sk->[i_item_sk]
6669
------------------------------PhysicalProject
67-
--------------------------------PhysicalOlapScan[item] apply RFs: RF10 RF11
70+
--------------------------------PhysicalOlapScan[item] apply RFs: RF10
6871
------------------------------PhysicalProject
6972
--------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((web_sales.ws_bill_addr_sk = customer_address.ca_address_sk)) otherCondition=() build RFs:RF9 ca_address_sk->[ws_bill_addr_sk]
7073
----------------------------------PhysicalProject
@@ -77,7 +80,4 @@ PhysicalResultSink
7780
----------------------------------PhysicalProject
7881
------------------------------------filter((customer_address.ca_gmt_offset = -5.00))
7982
--------------------------------------PhysicalOlapScan[customer_address]
80-
--------------------PhysicalProject
81-
----------------------filter((item.i_category = 'Home'))
82-
------------------------PhysicalOlapScan[item]
8383

regression-test/data/shape_check/tpcds_sf100/rf_prune/query35.out

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -10,38 +10,38 @@ PhysicalResultSink
1010
--------------hashAgg[LOCAL]
1111
----------------PhysicalProject
1212
------------------filter(OR[ifnull($c$1, FALSE),ifnull($c$2, FALSE)])
13-
--------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((c.c_customer_sk = catalog_sales.cs_ship_customer_sk)) otherCondition=()
13+
--------------------hashJoin[RIGHT_SEMI_JOIN shuffleBucket] hashCondition=((c.c_customer_sk = catalog_sales.cs_ship_customer_sk)) otherCondition=()
1414
----------------------PhysicalProject
15-
------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((customer_demographics.cd_demo_sk = c.c_current_cdemo_sk)) otherCondition=()
15+
------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((catalog_sales.cs_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF5 d_date_sk->[cs_sold_date_sk]
1616
--------------------------PhysicalProject
17-
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((c.c_current_addr_sk = ca.ca_address_sk)) otherCondition=()
18-
------------------------------hashJoin[LEFT_SEMI_JOIN bucketShuffle] hashCondition=((c.c_customer_sk = store_sales.ss_customer_sk)) otherCondition=() build RFs:RF3 ss_customer_sk->[c_customer_sk,ws_bill_customer_sk]
19-
--------------------------------hashJoin[LEFT_SEMI_JOIN broadcast] hashCondition=((c.c_customer_sk = web_sales.ws_bill_customer_sk)) otherCondition=()
20-
----------------------------------PhysicalProject
21-
------------------------------------PhysicalOlapScan[customer(c)] apply RFs: RF3
22-
----------------------------------PhysicalProject
23-
------------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((web_sales.ws_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF2 d_date_sk->[ws_sold_date_sk]
24-
--------------------------------------PhysicalProject
25-
----------------------------------------PhysicalOlapScan[web_sales] apply RFs: RF2 RF3
26-
--------------------------------------PhysicalProject
27-
----------------------------------------filter((date_dim.d_qoy < 4) and (date_dim.d_year = 2001))
28-
------------------------------------------PhysicalOlapScan[date_dim]
29-
--------------------------------PhysicalProject
30-
----------------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((store_sales.ss_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF1 d_date_sk->[ss_sold_date_sk]
31-
------------------------------------PhysicalProject
32-
--------------------------------------PhysicalOlapScan[store_sales] apply RFs: RF1
33-
------------------------------------PhysicalProject
34-
--------------------------------------filter((date_dim.d_qoy < 4) and (date_dim.d_year = 2001))
35-
----------------------------------------PhysicalOlapScan[date_dim]
36-
------------------------------PhysicalProject
37-
--------------------------------PhysicalOlapScan[customer_address(ca)]
38-
--------------------------PhysicalProject
39-
----------------------------PhysicalOlapScan[customer_demographics]
40-
----------------------PhysicalProject
41-
------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((catalog_sales.cs_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF0 d_date_sk->[cs_sold_date_sk]
42-
--------------------------PhysicalProject
43-
----------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF0
17+
----------------------------PhysicalOlapScan[catalog_sales] apply RFs: RF5
4418
--------------------------PhysicalProject
4519
----------------------------filter((date_dim.d_qoy < 4) and (date_dim.d_year = 2001))
4620
------------------------------PhysicalOlapScan[date_dim]
21+
----------------------hashJoin[LEFT_SEMI_JOIN bucketShuffle] hashCondition=((c.c_customer_sk = web_sales.ws_bill_customer_sk)) otherCondition=()
22+
------------------------hashJoin[RIGHT_SEMI_JOIN shuffle] hashCondition=((c.c_customer_sk = store_sales.ss_customer_sk)) otherCondition=()
23+
--------------------------PhysicalProject
24+
----------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((store_sales.ss_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF3 d_date_sk->[ss_sold_date_sk]
25+
------------------------------PhysicalProject
26+
--------------------------------PhysicalOlapScan[store_sales] apply RFs: RF3
27+
------------------------------PhysicalProject
28+
--------------------------------filter((date_dim.d_qoy < 4) and (date_dim.d_year = 2001))
29+
----------------------------------PhysicalOlapScan[date_dim]
30+
--------------------------PhysicalProject
31+
----------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((customer_demographics.cd_demo_sk = c.c_current_cdemo_sk)) otherCondition=()
32+
------------------------------PhysicalProject
33+
--------------------------------hashJoin[INNER_JOIN shuffle] hashCondition=((c.c_current_addr_sk = ca.ca_address_sk)) otherCondition=()
34+
----------------------------------PhysicalProject
35+
------------------------------------PhysicalOlapScan[customer(c)]
36+
----------------------------------PhysicalProject
37+
------------------------------------PhysicalOlapScan[customer_address(ca)]
38+
------------------------------PhysicalProject
39+
--------------------------------PhysicalOlapScan[customer_demographics]
40+
------------------------PhysicalProject
41+
--------------------------hashJoin[INNER_JOIN broadcast] hashCondition=((web_sales.ws_sold_date_sk = date_dim.d_date_sk)) otherCondition=() build RFs:RF0 d_date_sk->[ws_sold_date_sk]
42+
----------------------------PhysicalProject
43+
------------------------------PhysicalOlapScan[web_sales] apply RFs: RF0
44+
----------------------------PhysicalProject
45+
------------------------------filter((date_dim.d_qoy < 4) and (date_dim.d_year = 2001))
46+
--------------------------------PhysicalOlapScan[date_dim]
4747

0 commit comments

Comments
 (0)