Skip to content

Conversation

@davebayer
Copy link
Contributor

@davebayer davebayer commented Jan 24, 2026

This PR optimizes cuda::sub_overflow implementation for unsigned types.

See the SASS comparison:

Note: I need to find out whether the old implementation wasn't better for signed 32-bit and 64-bit types.

@davebayer davebayer requested a review from a team as a code owner January 24, 2026 11:01
@davebayer davebayer requested a review from wmaxey January 24, 2026 11:01
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jan 24, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Jan 24, 2026
@davebayer davebayer force-pushed the optimize_sub_overflow_unsigned branch from 2db7342 to c1ba39d Compare January 24, 2026 11:37
@davebayer davebayer changed the title Optimize cuda::sub_overflow for unsigned types Optimize cuda::sub_overflow Jan 24, 2026
@davebayer davebayer requested review from fbusato and removed request for wmaxey January 24, 2026 11:38
@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 08m: Pass: 100%/84 | Total: 15h 47m | Max: 43m 08s | Hits: 99%/199232

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant