[move compiler] [CSE Step 2] common subexpression elimination #17989

junxzm1990 · 2025-10-29T15:45:10Z

Description

This PR implements the "common subexpression elimination" (CSE) transformation. This is the second PR on the stack to introduce CSE optimizations to Move.

Motivating Example:

1. fun test(data: S, a: u64, b: u64): u64 {
2.       if (data.x != 0) {
3.           a / data.x
4.       } else {
5.           data.x + 1
6.       }
7.   }

At the stackless bytecode level, data.x is translated into a seq of BorrowLoc + BorrowField + ReadRef instructions.
Without CSE, all occurance of the same expression data.x (line 2, line 3, line 5) will be translated into the seq above, despite data.x at line 3 and line 5 share the same result of line 2 and the computations are not necessary.

CSE aims to eliminate such redundant computations by reusing the result of previous computations.
Specifically, in the example above, assuming the BorrowLoc + BorrowField + ReadRef sequence at line 2 is assigned to temp t1,
then the occurrences at line 3 and line 5 can both be replaced by t1, eliminating the redundant computations.
The optimized bytecode would look like:

0: $t6 := borrow_local($t0)
1: $t7 := borrow_field<0x8675::M::S>.x($t6)
2: $t5 := read_ref($t7) // `data.x` at line 2 assigned to $t5
3: $t8 := 0
4: $t4 := !=($t5, $t8)
5: if ($t4) goto 6 else goto 11
6: label L0
7: $t9 := move($t1)
8: $t10 := copy($t5) 
9: $t3 := /($t9, $t10) // line 3 reuses $t5
10: label L2
11: return $t3
12: label L1
13: $t16 := 1
14: $t11 := copy($t5)
15: $t3 := +($t11, $t16) // line 5 reuses $t5
16: goto 9

============================ Implementation Details ============================

Step 1: Build the Control Flow Graph (CFG) and Domination Tree of a target function.

Step 2: Traverse the Domination Tree in preorder, and for each basic block, for each instruction:

If the instruction is PURE, canonicalize the expression represented by the instruction into an ExprKey structure
- ExprKey contains the operation and its arguments, represented as ExpArg,
- ExpArg can be either a constant, a variable (temp), or another ExprKey to nest expressions recursively
  - Motivation to nest expression: consider the expression ReadRef(BorrowField(BorrowLoc(x))), we want to
    represent it as a single expression rather than three separate ones, so that we can eliminate
    the entire sequence at once.
  - Conditions to nest t1 = Op1(t0); t2 = Op2(t1); as Op2(Op1(t0)):
    - The definition at Op1 is the only definition of of t1 that can reach the instruction of Op2
    - t1 is only used once and exactly by Op2.
  - For commutative operations, the arguments are sorted to get a canonical order
Why pre-order traversal: ensure that all dominating blocks have been processed before the dominated ones,
hencing not missing opportunities for replacement

Step 3: Check if the ExprKey from Step 2 has been seen before in a dominating block.

Given a seen-before ExprKey (annotated as src_expr) for the current expression (annotated as dest_expr),
and assuming the two expressions have the following formats:

src_expr: (src_temp1, src_temp2, ...) = src_op(src_ope1, src_ope2, ...) defined at src_inst, where src_ope1 and src_ope2 can be nested expressions.
dest_expr: (dest_temp1, dest_temp2, ...) = dest_op(dest_ope1, dest_ope2, ...) defined at dest_inst, where dest_ope1 and dest_ope2 can be nested expressions.

we take a set of conservative conditions to check safety of the replacement:

Condition 1. src_expr dominates dest_expr
- This ensures that src_expr is always executed before dest_expr
Condition 2: type safety
- src_temps and dest_temps share the same types
  - Otherwise, we may encounter type conflict when copying src_temp to dest_temp
- stc_temp is not mutably borrowed
  - Otherwise, we may create a conflicting use while src_temp is mutably borrowed
Condition 3: src_temps are copyable
- This ensures that copying src_temps to dest_temps does not violate ability constraints
Condition 4: src_temps at src_expr are the only definitions of src_temps that can reach dest_expr:
- This ensures that we are not copying wrong values to dest_temps
Condition 5: Resources used in src_expr are not changed at dest_expr:
- This ensures that BorrowGlobal and Exists operations are safe to reuse at dest_expr
- This only applies when BorrowGlobal and Exists are involved in src_expr and dest_expr
Condition 6: Operands used in src_expr are safe to reuse at dest_expr:
- Operands used in src_expr are identical to those used in dest_expr
- None of the operands used in src_expr are possibly re-defined in a path between src_expr and dest_expr (without going through src_expr again)
  - This ensures that the values of the operands used in src_inst remain unchanged when reaching dest_inst
- None of the operands used in src_expr are mutably borrowed elsewhere
  - This ensures that we are not creating conflicting uses while the operands are mutably borrowed
Condition 7: The replacement will bring performance gains! See comments above gain_perf for details

Step 4: for each src_expr passing the conditions to replace dest_expr in Step 3, we check gather necessary information to perform replacement like below:

Example:

1. src_temp = pure_computation_1(t0)      // src_inst
2. ...
3. use(src_temp)
4. dest_temp = pure_computation_1(t0)      // dest_inst
5. ...
6. use(dest_temp)

==>

1. src_temp = pure_computation_1(t0)      // src_inst
2. ...
3. use(src_temp)
4. dest_temp = copy(src_temp)      // inserted copy
5. ...
6. use(dest_temp)

Step 5: After processing all basic blocks, we perform the recorded replacements and eliminate the marked code.

============================ Extensions ============================

In principle, the algorithm above is designed to handle PURE instructions, defined as blow

the results only depend on the operands
has no side effects on memory (including write via references), control flow (including abort), or external state (global storage)
recomputing it multiple times yields no semantic effect.

Yet, we found that some non-pure instructions can be safely handled under certain conditions.

Group 1: operations that are pure if no arithmetic errors like overflows happen (+, -, *, /, %, etc):

such operations are dealt as pure in aggressive mode
their side effects are safe because, if those happen, they are guaranteed to happen earlier in the src_inst

Group 2: operations that are pure if no type errors happen (UnpackVariant):

such operations are dealt as pure in aggressive mode
their side effects are safe because, if those happen, they are guaranteed to happen earlier in the src_inst

Group 3: BorrowLoc, BorrowField, BorrowVariantField

In principle, borrow operations are not pure as they depend on memory states.
However, if we guarantee that the memory states are not changed between src_inst and dst_inst, we can treat them as pure.

Group 4: Assign

It can be treated as pure when the assign kind is Copy or Inferred (TODO([Move compiler V2] master issue tracking Common Subexpression Elimination todos #18203): reasoning more about Inferred)

Group 5: readref

In principle, readref is not pure as it depends on memory states.
However, if we guarantee the memory states are not changed between src_inst and dst_inst, we can treat them as pure.

Group 6: Function calls

A function call can be treated as pure if the callee
- Does not modify any memory via mutable references
- Does not access global resources

Group 7: BorrowGlobal and Exists

They can be treated as pure if we guarantee that the resources involved are not modified between src_inst and dst_inst

Stats by applying CSE on `framework` code

The evaluations are configured to run under the most restrictive configs regarding safety and bytecode reduction.
The CSE implementation counts Call Operation::Function as 3 bytecodes. Yet, the stats only counts it as one. This further reduces the reduction shown up in the stats.

`aptos-experimental`

order_book_types.move: +14
- why increase:
  - Call Operation::Function is only counted as one bytecode
  - CSE unexpected intervened with a later-phase control flow simplification optimization. This is non-trivial to be reasoned and counted.
bulk_order_book_types.move: +4 (why increase: same as above)
single_order_book.move: -3
veiled_coin.move: -4
confidential_proof.move: -17

`aptos-framework`

storage_gas.move: +4 (why increase: same as aptos-experimental::order_book_types.move)
primary_fungible_store.move: +1 (why increase: Call Operation::Function is only counted as one bytecode in stats)
coin.move: +1 (why increase: Call Operation::Function is only counted as one bytecode in stats)
stake.move: -2
ordered_map.move: -2
account.move: -2
multisig_account.move: -5

`aptos-stdlib`

smart_table.move: +4 (why increase: same as aptos-experimental::order_book_types.move)

`move-stdlib`

None

`aptos-tokens`

None

TODOs

All TODO items are marked with TODO(#18203).

How Has This Been Tested?

Existing compiler tests and transactional tests
New test cases will be added in next PR

Expected Result Changes

Expensive recomputation is elimintated with reuse of the result from the first computation.
Intermediate analysis results introduced by CSE

Type of Change

Which Components or Systems Does This Change Impact?

Note

Introduce a CSE optimization pass to the stackless bytecode pipeline, gated by an experiment flag, with supporting analysis/plumbing updates and test baselines.

Optimizer (Pipeline):
- Add CommonSubexpElimination pass, integrated before ability checks; wired with LiveVar, FlushWrites, ReferenceSafety (v2/v3), and ReachingDef analyses.
- New experiment flag common-subexp-elimination; register reaching-def formatter.
Core Utilities:
- Graph: expose DomRelation, add preorder traversal; CFG: add successor_insts() and edges().
- Bytecode/Operation helpers: purity classifiers (is_pure, pure_if_no_*), is_commutative, borrowing detectors, and commutativity handling; extend function target to get loc by offset.
- Types: add estimate_size heuristic; well-known vector names extended; minor cost model scaffolding in CSE.
Tests:
- Update numerous expected outputs to reflect added analysis stages/annotations and control-flow/flush changes introduced by the new pass.

^{Written by Cursor Bugbot for commit 8679d38. This will update automatically on new commits. Configure here.}

junxzm1990 · 2025-10-29T15:45:30Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

third_party/move/move-compiler-v2/src/pipeline/common_subexp_elimination.rs

cursor · 2025-12-08T22:39:12Z

third_party/move/move-compiler-v2/src/pipeline/common_subexp_elimination.rs

+            }
+        }
+        globals
+    }


Bug: collect_resources misses Exists operation for safety check

The collect_resources function only collects resources from Operation::BorrowGlobal but not from Operation::Exists, despite the documentation stating that Condition 5 ensures both BorrowGlobal and Exists operations are safe to reuse. Since BytecodeSanitizer::Exists allows Exists operations to be CSE candidates in aggressive mode, but collect_resources doesn't include them, the resources_safe_to_reuse check will operate on empty collections for Exists expressions. This could allow incorrect CSE replacement when an Exists check is repeated but the underlying resource state changed between the two calls (via MoveTo or MoveFrom).

junxzm1990 mentioned this pull request Oct 29, 2025

[move linter] fix issues due to missing handling function values #17990

Merged

16 tasks

junxzm1990 marked this pull request as ready for review October 29, 2025 15:46

junxzm1990 marked this pull request as draft October 29, 2025 15:47

junxzm1990 force-pushed the jun/cse-opt branch 10 times, most recently from b6f8d5d to 97a2b06 Compare November 12, 2025 20:28

junxzm1990 changed the base branch from main to graphite-base/17989 November 24, 2025 16:13

junxzm1990 force-pushed the jun/cse-opt branch from 97a2b06 to e009ba3 Compare November 24, 2025 16:13

junxzm1990 changed the base branch from graphite-base/17989 to jun/reach-def November 24, 2025 16:13

junxzm1990 mentioned this pull request Nov 24, 2025

[move compiler] [CSE Step 1] add reaching-def analysis #18201

Open

16 tasks

junxzm1990 force-pushed the jun/reach-def branch from ca16b0f to 6dd56f1 Compare November 24, 2025 16:22

junxzm1990 force-pushed the jun/cse-opt branch from e009ba3 to 6eb7704 Compare November 24, 2025 16:22

junxzm1990 force-pushed the jun/reach-def branch from 6dd56f1 to d9b6d27 Compare November 24, 2025 16:37

junxzm1990 force-pushed the jun/cse-opt branch 2 times, most recently from 8b88a87 to 3066301 Compare November 24, 2025 17:04

junxzm1990 force-pushed the jun/reach-def branch 2 times, most recently from a6827da to 69c4e71 Compare November 24, 2025 20:56

junxzm1990 force-pushed the jun/cse-opt branch 3 times, most recently from 894fa74 to 4e29fd9 Compare November 25, 2025 03:40

junxzm1990 force-pushed the jun/reach-def branch from 69c4e71 to 54a1873 Compare November 25, 2025 16:04

junxzm1990 force-pushed the jun/cse-opt branch from 4e29fd9 to 947e5c9 Compare November 25, 2025 16:04

junxzm1990 force-pushed the jun/cse-opt branch from 285eb08 to 054714b Compare November 25, 2025 21:26

junxzm1990 force-pushed the jun/reach-def branch from cde52be to 88ac450 Compare November 25, 2025 21:26

junxzm1990 force-pushed the jun/cse-opt branch 4 times, most recently from 8bbbf38 to 4f00c2e Compare November 26, 2025 01:01

junxzm1990 force-pushed the jun/reach-def branch from 88ac450 to 2d543b1 Compare November 26, 2025 01:01

junxzm1990 force-pushed the jun/cse-opt branch 6 times, most recently from 9e565bb to 50f2a44 Compare November 27, 2025 00:06

junxzm1990 requested a review from vineethk December 1, 2025 16:06

junxzm1990 force-pushed the jun/cse-opt branch from 50f2a44 to d43320c Compare December 2, 2025 17:37

junxzm1990 force-pushed the jun/reach-def branch from 2d543b1 to 22fffeb Compare December 2, 2025 17:37

junxzm1990 force-pushed the jun/cse-opt branch 4 times, most recently from e920739 to 3752618 Compare December 5, 2025 16:38

junxzm1990 force-pushed the jun/reach-def branch from 22fffeb to cadaebb Compare December 5, 2025 16:38

junxzm1990 force-pushed the jun/cse-opt branch from 3752618 to 116067d Compare December 5, 2025 18:04

junxzm1990 force-pushed the jun/reach-def branch from cadaebb to 68e406f Compare December 5, 2025 23:35

junxzm1990 force-pushed the jun/cse-opt branch from 116067d to 5b4d51b Compare December 5, 2025 23:35

junxzm1990 force-pushed the jun/reach-def branch from 68e406f to 27dfb30 Compare December 8, 2025 16:37

junxzm1990 force-pushed the jun/cse-opt branch from 5b4d51b to 2b2c67f Compare December 8, 2025 16:37

cursor bot reviewed Dec 8, 2025

View reviewed changes

third_party/move/move-compiler-v2/src/pipeline/common_subexp_elimination.rs Show resolved Hide resolved

[move compiler] common subexpression elimination

8679d38

junxzm1990 force-pushed the jun/cse-opt branch from 2b2c67f to 8679d38 Compare December 8, 2025 22:20

cursor bot reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[move compiler] [CSE Step 2] common subexpression elimination #17989

[move compiler] [CSE Step 2] common subexpression elimination #17989

junxzm1990 commented Oct 29, 2025 •

edited by cursor bot

Loading

Uh oh!

junxzm1990 commented Oct 29, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[move compiler] [CSE Step 2] common subexpression elimination #17989

Are you sure you want to change the base?

[move compiler] [CSE Step 2] common subexpression elimination #17989

Conversation

junxzm1990 commented Oct 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Stats by applying CSE on framework code

aptos-experimental

aptos-framework

aptos-stdlib

move-stdlib

aptos-tokens

TODOs

How Has This Been Tested?

Expected Result Changes

Type of Change

Which Components or Systems Does This Change Impact?

Uh oh!

junxzm1990 commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

cursor bot Dec 8, 2025

Choose a reason for hiding this comment

Bug: collect_resources misses Exists operation for safety check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junxzm1990 commented Oct 29, 2025 •

edited by cursor bot

Loading

Stats by applying CSE on `framework` code

`aptos-experimental`

`aptos-framework`

`aptos-stdlib`

`move-stdlib`

`aptos-tokens`

junxzm1990 commented Oct 29, 2025 •

edited

Loading

Bug: `collect_resources` misses `Exists` operation for safety check