Skip to content

Conversation

@andreas-schwab
Copy link

The AArch64 and RISC-V atomic implementations are defective, remove them
and replace them with the generic GCC atomic builtins.

Fixes #247

@sbahra
Copy link
Member

sbahra commented Nov 27, 2025

@cognet What are your thoughts on this? I have nothing against it.

@andreas-schwab I see s390x removed as well. This was contributed a while ago. What's the reasoning there?

@cognet
Copy link
Contributor

cognet commented Nov 27, 2025

@sbahra I'm not opposed to the idea, I'm not sure there's much value left in having our own atomic stuff, however, the our current gcc implementation still uses the old _sync* intrinsics, which will mean full barriers everywhere, and a potentially noticeable performance impact.
I'll try to find some time this week-end to investigate on what's wrong with our implementation in the first place

@michael-grunder
Copy link
Contributor

Not sure if it's helpful but __atomic_fetch_add(d, v, __ATOMIC_RELAXED) from GCC does not emit a dmb

00000000000005d0 <gcc_atomic_faa>:
 5d0:	mov	x2, x0
 5d4:	ldxr	x0, [x2]
 5d8:	add	x3, x0, x1
 5dc:	stxr	w4, x3, [x2]
 5e0:	cbnz	w4, 5d4 <gcc_atomic_faa+0x4>
 5e4:	ret

It's very similar to ck_pr_faa:

00000000000005f0 <ck_faa>:
 5f0:	ldxr	x2, [x0]
 5f4:	add	x3, x1, x2
 5f8:	stxr	w4, x3, [x0]
 5fc:	cbnz	w4, 5f0 <ck_faa>
 600:	mov	x0, x2
 604:	ret

Also interestingly on newer aarch64 CPUs with -march=native FAA is a single instruction:

00000000000005b0 <gcc_sync_faa>:
 5b0:	ldaddal	x1, x0, [x0]
 5b4:	ret

The AArch64, S390x and RISC-V atomic implementations are defective, remove
them and replace them with the generic GCC atomic builtins.

Fixes concurrencykit#247
@sbahra
Copy link
Member

sbahra commented Jan 2, 2026

@cognet Any updates on this?

@cognet
Copy link
Contributor

cognet commented Jan 5, 2026

@sbahra I don't and at this point I'm far away from my arm64 hardware, and won't be able to work on this for a while.
I guess we should just merge that in the meanwhile, though I really which we'd have an implementation not using the legacy __sync API.

@michael-grunder
Copy link
Contributor

I guess we should just merge that in the meanwhile, though I really which we'd have an implementation not using the legacy __sync API.

Correctness is paramount but that is probably going to be brutal for performance 😄

@sbahra
Copy link
Member

sbahra commented Jan 7, 2026

@cognet Would it help if I get you an AWS instance with ARM? Otherwise, I'll need to find time myself to look at this and that's been tough lately. I'd like to better understand this and then harden our tests.

@andreas-schwab The issue you listed also refers to RiscV. I'm concerned about a deeper underlying issue. Is the ck_pr implementation for Risc V defective? If you use GCC intrinsics, does the issue still reproduce?

@cognet
Copy link
Contributor

cognet commented Jan 7, 2026

@sbahra I'm honestly not sure I'll be able to have a real look before late February.
I would really like to understand it too, to me it really screams we're hitting something identified as undefined behavior by the compiler, that then decides it can just do whatever it wants, and we're thus seeing nonsensical things. I'm sure once understood it will be trivial to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ck_epoch_section_2 hangs in ck_epoch_synchronize when compiled with LTO

4 participants