Skip to content

Enable autovectorization of bbox intersect and fix logical all#2423

Merged
sethrj merged 7 commits into
celeritas-project:developfrom
sethrj:bbox-isect
Jun 15, 2026
Merged

Enable autovectorization of bbox intersect and fix logical all#2423
sethrj merged 7 commits into
celeritas-project:developfrom
sethrj:bbox-isect

Conversation

@sethrj

@sethrj sethrj commented Jun 15, 2026

Copy link
Copy Markdown
Member

I think my incorrectly entered suggestion for logical_all in #2405 got committed by mistake. I've added a test that actually checks short circuiting, and added a logical_any operator as well. This is used in an updated implementation of intersect_segment (following on to #2422) that actually allows clang to autovectorize with -O3 (see https://github.com/sethrj/testsnippets/blob/master/_celeritas_code/bbox-intersect-segment/newer.s ) .

I also updated the Plane intersect implementation to use the logical_all, and added some methods to make its calculation look identical to AlignedPlane. (In a separate branch I tried unifying the intersect test for the plane to no effect: it's limited during the memory load.)

Anyway it looks like this gives another 1% boost on GPU, no effect seen on CPU, and the code is cleaner and it fixes a bug.

@sethrj sethrj requested a review from elliottbiondo as a code owner June 15, 2026 00:00
@sethrj sethrj added orange Work on ORANGE geometry engine performance Changes for performance optimization labels Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Test summary

 5 701 files   9 289 suites   18m 26s ⏱️
 2 321 tests  2 278 ✅  43 💤 0 ❌
32 806 runs  32 635 ✅ 171 💤 0 ❌

Results for commit af278e7.

♻️ This comment has been updated with latest results.

@elliottbiondo

elliottbiondo commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@sethrj I didn't use & within logical_all intentionally for the following reason (quoted from #2405):

I tried the function body both ways: & vs &&. Interestedly && is very slightly faster. Of course all args have already been evaluated at true/false when this line is executed.

Basically, I don't think values in the parameter pack are lazily evaluated so & and && are nearly same. This is supported by that fact that, (as noted in #2405) that using logical_all actually gave a 0.5% slowdown over using & directly throughout the code. I figured a 0.5% speedup was not worth having both & and && peppered throughout the code code, but maybe it is.

Here is what I am seeing for this MR:

Screenshot 2026-06-15 at 7 34 16 AM

Here, & wins over && very slightly unlike what I saw previously. I am guessing the 1% you were seeing was on a Milan?

@sethrj

sethrj commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Basically, I don't think values in the parameter pack are lazily evaluated so & and && are nearly same. This is supported by that fact that, (as noted in #2405) that using logical_all actually gave a 0.5% slowdown over using & directly throughout the code. I figured a 0.5% speedup was not worth having both & and && peppered throughout the code code, but maybe it is.

Sort of: the argument in whatever form is passed to the function, and then the evaluation of operator bool is done inside with/without short circuiting. That's why my test dfcba9e showed failures for logical_all. But it looks like that may not be the whole story since changing | to || in logical_all does increase the number of instructions in the resulting assembly: maybe it has to do this for something obscure like preventing signaling NaNs inside a short circuit evaluation. Not sure.

Anyway, the point is that this seems to be a win for instruction count but due to the memory-bound nature of the geometry, we don't see much benefit.

Here, & wins over && very slightly unlike what I saw previously. I am guessing the 1% you were seeing was on a Milan?

I think a next step would be to diff the CUDA assembly output from a kernel that just calls this method from existing arrays. For these kind of microoptimizations runtime may not be the best indicator. But the fact that it's not changing much indicates we probably shouldn't worry about it.

@elliottbiondo

Copy link
Copy Markdown
Contributor

Thinking about it more, and now that I see the logical_* tests you added here, yes it makes sense that lazy evaluation should work; the parameter packs are expanded at compile time.

@elliottbiondo elliottbiondo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.24%. Comparing base (3f36f28) to head (af278e7).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #2423    +/-   ##
=========================================
  Coverage    87.24%   87.24%            
=========================================
  Files         1399     1399            
  Lines        44149    44150     +1     
  Branches     13342    13814   +472     
=========================================
+ Hits         38516    38517     +1     
- Misses        4417     4568   +151     
+ Partials      1216     1065   -151     
Files with missing lines Coverage Δ
src/corecel/math/Algorithms.hh 97.45% <100.00%> (+0.03%) ⬆️
src/orange/BoundingBoxUtils.hh 100.00% <100.00%> (+2.70%) ⬆️
src/orange/surf/Plane.hh 100.00% <100.00%> (ø)
src/orange/surf/PlaneAligned.hh 96.77% <100.00%> (+0.47%) ⬆️

... and 115 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sethrj sethrj merged commit 1e51202 into celeritas-project:develop Jun 15, 2026
39 of 41 checks passed
@sethrj sethrj deleted the bbox-isect branch June 16, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

orange Work on ORANGE geometry engine performance Changes for performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants