Csharp iox2 12 ffi testing#20
Conversation
Introduces an internal helper that pins managed state against GC moves, returns an IntPtr token for the FFI `context` parameter, and recovers the state in native callbacks. Foundation for removing the static callback state that races in Node.List() and WaitSet callbacks.
The C# constant was out of sync with the cbindgen-generated header (iceoryx2.h line 69: `#define IOX2_SERVICE_ID_LENGTH 64`). Node.List()'s manual marshaling was reading 32 bytes of service ID and then misreading the name field at offset 32 instead of 64, returning truncated / cross-contaminated strings. Discovered while fixing the static-field race in Node.List(). The existing ServiceDiscoveryTests avoided the name field (the "service names might be stored in a different format" note) which is why the pre-existing bug wasn't flagged sooner.
Removes the static `_tempServiceList` field that two concurrent callers could clobber. Each List() invocation now pins its own result list via CallbackContext and passes the IntPtr token through the native `context` parameter; the static callback recovers it via Peek. The concurrent tests added in the previous commit now pass reliably.
…tiveCallback Converts the third WaitSet callback site to CallbackContext using a small record (Callback, DisposeAttachments) as the pinned state. Drops the instance `_nativeCallback` field: all three sites now use static trampolines held as `s_*Trampoline` static readonly fields.
|
@dkroenke do you have time to review this one? |
| @@ -0,0 +1,80 @@ | |||
| // Copyright (c) 2025 Contributors to the Eclipse Foundation | |||
There was a problem hiding this comment.
Should be 2026, same for the other newly created source files
There was a problem hiding this comment.
Done — bumped to 2026 across the six newly added files (CallbackContext.cs, FfiAssert.cs, NativeStructSizesTests.cs, WaitSetConcurrentTests.cs, CallbackContextTests.cs, ServiceDiscoveryConcurrentTests.cs) in 7bdd9fd.
There was a problem hiding this comment.
Does this really fits to the issue description? For a better readability, it would be better to split the text into separate bullet points.
There was a problem hiding this comment.
Split into three per-fix bullets in 3f2d914.
On the issue-fit question: #12 is about FFI testing in general and explicitly calls out two of the bugs ("Callback testing is limited and not thread-safe — Service discovery uses thread-local state (_tempServiceList) which is error-prone" → covered by the Node.List + WaitSet fixes; and "Union marshaling requires manual workarounds" → the SERVICE_ID_LENGTH offset fix is the same code path), so the bugfixes are in scope for the issue. Happy to move them under a separate follow-up issue if you'd prefer cleaner accounting.
| // Read the basic fields directly from memory without creating the full struct | ||
| // to avoid issues with the union in iox2_static_config_t | ||
|
|
There was a problem hiding this comment.
Why are these comments removed?
There was a problem hiding this comment.
Good catch — the offset comments did real work explaining the manual marshal. Restored in 708cc02.
| internal const int IOX2_NODE_NAME_LENGTH = 128; | ||
| internal const int IOX2_SERVICE_NAME_LENGTH = 255; | ||
| internal const int IOX2_SERVICE_ID_LENGTH = 32; | ||
| internal const int IOX2_SERVICE_ID_LENGTH = 64; |
There was a problem hiding this comment.
Just a general question, is there a possibility to read or import the C header to take the values directly from the source? That could possibly reduce the maintenance burden instead of duplicating the values.
There was a problem hiding this comment.
Fair point — and that is exactly the same maintenance pain that motivated NativeStructSizesTests (the storage-size tripwire). A few directions, ranked by realism:
- ClangSharp / CppSharp source-gen. Run cbindgen → C header → ClangSharp at build time and emit
Iox2NativeMethods.Generated.cs. This is the proper fix but adds an LLVM/libclang build dependency and a generator project. Probably needs its own issue. - A
cbindgen --emit-cmake-config-style sidecar. Have the FFI build also emit a small.jsonof constants + struct sizes that a tiny C# T4 / source generator consumes. Lower-weight than ClangSharp but still needs a tool we don't yet have. - Status quo + tripwires. What's in this PR: the manual
constmirror plusNativeStructSizesTeststo catch drift. Cheap, but only catches what we explicitly bookkeep.
I'd lean toward (1) as a follow-up but don't like it at the moment — happy to open an issue for it. For this PR I think the tripwire approach is the right scope, but let me know if you'd rather hold this one until the generator lands.
| catch | ||
| { | ||
| Console.WriteLine($"Error in service list callback: {ex.Message}"); | ||
| Console.WriteLine($"Stack trace: {ex.StackTrace}"); |
There was a problem hiding this comment.
Wouldn't it help for debugging to have a stack trace?
There was a problem hiding this comment.
Good point on the diff between callback layers — the trampolines in WaitSet intentionally swallow everything because they wrap arbitrary user code at the FFI boundary (a user-thrown NullReferenceException must not unwind into native iceoryx2). But ServiceListCallback only runs our own marshalling code; if it throws, that is a binding bug and silencing the stack trace makes it un-debuggable.
Restored the diagnostic logging in 493c2de, this time on Console.Error so it doesn't pollute the stdout-driven console logger contract. Will return STOP on failure as before.
| /// </summary> | ||
| public class NativeStructSizesTests | ||
| { | ||
| private const int ServiceBuilderStorageSize = 9104; // iceoryx2.h iox2_service_builder_storage_t |
There was a problem hiding this comment.
Is it possible to share the magic numbers from src/Iceoryx2/Native/Iox2NativeMethods.cs with this file somehow? That would make it less error-prone in case the values are changing.
There was a problem hiding this comment.
I'd push back on this one — sharing erases the tripwire that the test is supposed to be. Marshal.SizeOf<T> reads the StructLayout(Size = ...) attribute, so if both the attribute and the test reference the same shared const, the assert collapses to const == const and trivially passes.
The double-entry duplication is load-bearing: the failure mode it catches is exactly "someone updated the C# binding's Size to track new cbindgen output but forgot the test constant (or vice versa)". Killing the duplication kills the only thing forcing the maintainer to touch both files.
In the long run the right fix is generating both sides from the C header (see the thread on IOX2_SERVICE_ID_LENGTH). Until then I'd keep the duplication, but I added an inline note at the literals (eebaffe) so the next reader doesn't have the same "why aren't these shared?" reaction. Let me know if you'd rather I go the other way.
Files added in this PR were authored in 2026; reviewer asked the headers be dated accordingly. Existing repo files keep 2025 since that reflects their creation year.
Reviewer asked for better readability. The merged paragraph hid the three distinct fixes (Node.List race, WaitSet cross-call overwrite, SERVICE_ID_LENGTH header mismatch); separate bullets each linking to the same issue make the scope of eclipse-iceoryx#12 easier to scan.
The cleanup commit (bf33601) dropped the inline comments that explained why ServiceListCallback walks the iox2_static_config_t payload by hand (union avoidance) and what each Marshal.Copy offset corresponds to. Reviewer flagged the loss; restoring them so future readers do not have to reverse-engineer the offset arithmetic.
Reviewer asked why the silent catch dropped the previous diagnostic output. The WaitSet trampolines stay silent because they wrap arbitrary user callbacks at the FFI boundary, but ServiceListCallback runs only our own marshalling code — if it throws, that is a binding bug and the stack trace is exactly what a maintainer needs. Routing through stderr keeps it out of the normal stdout-driven console logger contract.
…uplicated Reviewer suggested sharing the storage-size literals between Iox2NativeMethods.cs and this test file. Doing so would erase the tripwire: Marshal.SizeOf<T> reads the StructLayout(Size = ...) attribute, so if both sides derive from the same const, the assert collapses to const-equals-const and trivially passes. The double-entry bookkeeping is the point — a maintainer who updates one side but forgets the other gets a failing test. Adding an in-line note so the design intent is visible at the literals, not buried in the class docstring.
Notes for Reviewer
Iceoryx2.Native.CallbackContext(GCHandle-pinned token) and routes three FFI callback sites (Node.List,WaitSet.WaitAndProcessOnce,WaitSet.WaitAndProcessOnce(timeout),WaitSet.WaitAndProcessInternal) through it.Node.List()used a static_tempServiceListfield — two concurrent callers on the same process could share a list. Regression test:ServiceDiscoveryConcurrentTests.WaitSet._nativeCallbackwas a single instance field rewritten on everyWaitAndProcessOnce*/WaitAndProcessInternalcall — a second overlapping call would overwrite the first's closure. Now replaced by twostatic readonlytrampolines + per-call pinned context.IOX2_SERVICE_ID_LENGTHwas 32 but the cbindgen header defines it as 64 (iceoryx2.hline 69).Node.List()'s manual marshal was reading thenamefield at the wrong offset. No existing test caught this becauseServiceDiscoveryTestsdeliberately avoided thenamefield.CallbackContext.Pin/Unpinpairing in every caller (Pinoutsidetry,Unpininfinally).catch { return STOP; }— intentionally silent; swallowing user-callback exceptions is required at the FFI boundary. Pinned in tests (WaitSet_TrampolineReturnsStop_WhenCallbackThrows).NativeStructSizesTestshardcodes 9104 / 18696 with header references — treat as a tripwire, update in lockstep with cbindgen output.Pre-Review Checklist for the PR Author
Convert to draft)csharp-iox2-123-short-description)[#123] Add feature description)PR Reviewer Reminders
dotnet formathas been exectued before submittingReferences
Closes #12