Enable autovectorization of bbox intersect and fix logical all#2423
Conversation
Test summary 5 701 files 9 289 suites 18m 26s ⏱️ Results for commit af278e7. ♻️ This comment has been updated with latest results. |
|
@sethrj I didn't use
Basically, I don't think values in the parameter pack are lazily evaluated so Here is what I am seeing for this MR:
Here, |
Sort of: the argument in whatever form is passed to the function, and then the evaluation of Anyway, the point is that this seems to be a win for instruction count but due to the memory-bound nature of the geometry, we don't see much benefit.
I think a next step would be to diff the CUDA assembly output from a kernel that just calls this method from existing arrays. For these kind of microoptimizations runtime may not be the best indicator. But the fact that it's not changing much indicates we probably shouldn't worry about it. |
|
Thinking about it more, and now that I see the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2423 +/- ##
=========================================
Coverage 87.24% 87.24%
=========================================
Files 1399 1399
Lines 44149 44150 +1
Branches 13342 13814 +472
=========================================
+ Hits 38516 38517 +1
- Misses 4417 4568 +151
+ Partials 1216 1065 -151
🚀 New features to boost your workflow:
|

I think my incorrectly entered suggestion for
logical_allin #2405 got committed by mistake. I've added a test that actually checks short circuiting, and added alogical_anyoperator as well. This is used in an updated implementation ofintersect_segment(following on to #2422) that actually allows clang to autovectorize with-O3(see https://github.com/sethrj/testsnippets/blob/master/_celeritas_code/bbox-intersect-segment/newer.s ) .I also updated the Plane intersect implementation to use the
logical_all, and added some methods to make its calculation look identical to AlignedPlane. (In a separate branch I tried unifying the intersect test for the plane to no effect: it's limited during the memory load.)Anyway it looks like this gives another 1% boost on GPU, no effect seen on CPU, and the code is cleaner and it fixes a bug.