Enable autovectorization of bbox intersect and fix logical all by sethrj · Pull Request #2423 · celeritas-project/celeritas

sethrj · 2026-06-15T00:00:40Z

I think my incorrectly entered suggestion for logical_all in #2405 got committed by mistake. I've added a test that actually checks short circuiting, and added a logical_any operator as well. This is used in an updated implementation of intersect_segment (following on to #2422) that actually allows clang to autovectorize with -O3 (see https://github.com/sethrj/testsnippets/blob/master/_celeritas_code/bbox-intersect-segment/newer.s ) .

I also updated the Plane intersect implementation to use the logical_all, and added some methods to make its calculation look identical to AlignedPlane. (In a separate branch I tried unifying the intersect test for the plane to no effect: it's limited during the memory load.)

Anyway it looks like this gives another 1% boost on GPU, no effect seen on CPU, and the code is cleaner and it fixes a bug.

github-actions · 2026-06-15T00:17:09Z

Test summary

5 701 files 9 289 suites 18m 26s ⏱️
2 321 tests 2 278 ✅ 43 💤 0 ❌
32 806 runs 32 635 ✅ 171 💤 0 ❌

Results for commit af278e7.

♻️ This comment has been updated with latest results.

elliottbiondo · 2026-06-15T11:49:28Z

@sethrj I didn't use & within logical_all intentionally for the following reason (quoted from #2405):

I tried the function body both ways: & vs &&. Interestedly && is very slightly faster. Of course all args have already been evaluated at true/false when this line is executed.

Basically, I don't think values in the parameter pack are lazily evaluated so & and && are nearly same. This is supported by that fact that, (as noted in #2405) that using logical_all actually gave a 0.5% slowdown over using & directly throughout the code. I figured a 0.5% speedup was not worth having both & and && peppered throughout the code code, but maybe it is.

Here is what I am seeing for this MR:

Here, & wins over && very slightly unlike what I saw previously. I am guessing the 1% you were seeing was on a Milan?

sethrj · 2026-06-15T12:08:43Z

Basically, I don't think values in the parameter pack are lazily evaluated so & and && are nearly same. This is supported by that fact that, (as noted in #2405) that using logical_all actually gave a 0.5% slowdown over using & directly throughout the code. I figured a 0.5% speedup was not worth having both & and && peppered throughout the code code, but maybe it is.

Sort of: the argument in whatever form is passed to the function, and then the evaluation of operator bool is done inside with/without short circuiting. That's why my test dfcba9e showed failures for logical_all. But it looks like that may not be the whole story since changing | to || in logical_all does increase the number of instructions in the resulting assembly: maybe it has to do this for something obscure like preventing signaling NaNs inside a short circuit evaluation. Not sure.

Anyway, the point is that this seems to be a win for instruction count but due to the memory-bound nature of the geometry, we don't see much benefit.

Here, & wins over && very slightly unlike what I saw previously. I am guessing the 1% you were seeing was on a Milan?

I think a next step would be to diff the CUDA assembly output from a kernel that just calls this method from existing arrays. For these kind of microoptimizations runtime may not be the best indicator. But the fact that it's not changing much indicates we probably shouldn't worry about it.

elliottbiondo · 2026-06-15T12:22:47Z

Thinking about it more, and now that I see the logical_* tests you added here, yes it makes sense that lazy evaluation should work; the parameter packs are expanded at compile time.

elliottbiondo

Looks good!

codecov · 2026-06-15T13:12:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.24%. Comparing base (3f36f28) to head (af278e7).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff            @@
##           develop    #2423    +/-   ##
=========================================
  Coverage    87.24%   87.24%            
=========================================
  Files         1399     1399            
  Lines        44149    44150     +1     
  Branches     13342    13814   +472     
=========================================
+ Hits         38516    38517     +1     
- Misses        4417     4568   +151     
+ Partials      1216     1065   -151

Files with missing lines	Coverage Δ
src/corecel/math/Algorithms.hh	`97.45% <100.00%> (+0.03%)`	⬆️
src/orange/BoundingBoxUtils.hh	`100.00% <100.00%> (+2.70%)`	⬆️
src/orange/surf/Plane.hh	`100.00% <100.00%> (ø)`
src/orange/surf/PlaneAligned.hh	`96.77% <100.00%> (+0.47%)`	⬆️

... and 115 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sethrj added 6 commits June 14, 2026 16:01

logical_any and test

61c9e9c

Rewrite isect without short circuiting

76abdd5

Genericize plane and use biondo branch-free implementation

0ad6097

Rewrite plane aligned intersect more generically

0d0ecbc

Add logical all test

dfcba9e

Fix logical all short circuiting

f3fc9be

sethrj requested a review from elliottbiondo as a code owner June 15, 2026 00:00

sethrj added orange Work on ORANGE geometry engine performance Changes for performance optimization labels Jun 15, 2026

Silence clang warning

af278e7

elliottbiondo approved these changes Jun 15, 2026

View reviewed changes

sethrj merged commit 1e51202 into celeritas-project:develop Jun 15, 2026
39 of 41 checks passed

sethrj deleted the bbox-isect branch June 16, 2026 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable autovectorization of bbox intersect and fix logical all#2423

Enable autovectorization of bbox intersect and fix logical all#2423
sethrj merged 7 commits into
celeritas-project:developfrom
sethrj:bbox-isect

sethrj commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

elliottbiondo commented Jun 15, 2026 •

edited

Loading

Uh oh!

sethrj commented Jun 15, 2026

Uh oh!

elliottbiondo commented Jun 15, 2026

Uh oh!

elliottbiondo left a comment

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sethrj commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test summary

Uh oh!

elliottbiondo commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethrj commented Jun 15, 2026

Uh oh!

elliottbiondo commented Jun 15, 2026

Uh oh!

elliottbiondo left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 15, 2026 •

edited

Loading

elliottbiondo commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 15, 2026 •

edited

Loading