Skip to content

Csharp iox2 12 ffi testing#20

Open
patdhlk wants to merge 21 commits into
eclipse-iceoryx:mainfrom
patdhlk:csharp-iox2-12-ffi-testing
Open

Csharp iox2 12 ffi testing#20
patdhlk wants to merge 21 commits into
eclipse-iceoryx:mainfrom
patdhlk:csharp-iox2-12-ffi-testing

Conversation

@patdhlk
Copy link
Copy Markdown
Contributor

@patdhlk patdhlk commented Apr 24, 2026

Notes for Reviewer

  • Core change: introduces Iceoryx2.Native.CallbackContext (GCHandle-pinned token) and routes three FFI callback sites (Node.List, WaitSet.WaitAndProcessOnce, WaitSet.WaitAndProcessOnce(timeout), WaitSet.WaitAndProcessInternal) through it.
  • Bugs fixed:
    • Node.List() used a static _tempServiceList field — two concurrent callers on the same process could share a list. Regression test: ServiceDiscoveryConcurrentTests.
    • WaitSet._nativeCallback was a single instance field rewritten on every WaitAndProcessOnce*/WaitAndProcessInternal call — a second overlapping call would overwrite the first's closure. Now replaced by two static readonly trampolines + per-call pinned context.
    • IOX2_SERVICE_ID_LENGTH was 32 but the cbindgen header defines it as 64 (iceoryx2.h line 69). Node.List()'s manual marshal was reading the name field at the wrong offset. No existing test caught this because ServiceDiscoveryTests deliberately avoided the name field.
  • Focus points when reviewing:
    • CallbackContext.Pin/Unpin pairing in every caller (Pin outside try, Unpin in finally).
    • Trampoline catch { return STOP; } — intentionally silent; swallowing user-callback exceptions is required at the FFI boundary. Pinned in tests (WaitSet_TrampolineReturnsStop_WhenCallbackThrows).
    • NativeStructSizesTests hardcodes 9104 / 18696 with header references — treat as a tripwire, update in lockstep with cbindgen output.

Pre-Review Checklist for the PR Author

  • Add sensible notes for the reviewer
  • PR title is short, expressive and meaningful
  • Consider switching the PR to a draft (Convert to draft)
    • as draft PR, the CI will be skipped for pushes
  • Relevant issues are linked in the References section
  • Branch follows the naming format (csharp-iox2-123-short-description)
  • Commits messages are according to this guideline
    • Commit messages have the issue ID ([#123] Add feature description)
  • Tests have been added/updated for new functionality
  • XML documentation comments added to public APIs
  • Changelog updated in the unreleased section including API breaking changes
  • Assign PR to reviewer
  • All checks have passed

PR Reviewer Reminders

  • Commits are properly organized and messages are according to the guideline
  • Unit tests have been written for new behavior
  • Public API is documented with XML comments
  • PR title describes the changes
  • Code follows C# conventions (PascalCase for public APIs)
    • dotnet format has been exectued before submitting

References

Closes #12

patdhlk added 16 commits April 23, 2026 08:27
Introduces an internal helper that pins managed state against GC moves,
returns an IntPtr token for the FFI `context` parameter, and recovers
the state in native callbacks. Foundation for removing the static
callback state that races in Node.List() and WaitSet callbacks.
The C# constant was out of sync with the cbindgen-generated header
(iceoryx2.h line 69: `#define IOX2_SERVICE_ID_LENGTH 64`). Node.List()'s
manual marshaling was reading 32 bytes of service ID and then misreading
the name field at offset 32 instead of 64, returning truncated /
cross-contaminated strings.

Discovered while fixing the static-field race in Node.List(). The
existing ServiceDiscoveryTests avoided the name field (the "service
names might be stored in a different format" note) which is why the
pre-existing bug wasn't flagged sooner.
Removes the static `_tempServiceList` field that two concurrent callers
could clobber. Each List() invocation now pins its own result list via
CallbackContext and passes the IntPtr token through the native
`context` parameter; the static callback recovers it via Peek.

The concurrent tests added in the previous commit now pass reliably.
…tiveCallback

Converts the third WaitSet callback site to CallbackContext using a
small record (Callback, DisposeAttachments) as the pinned state. Drops
the instance `_nativeCallback` field: all three sites now use static
trampolines held as `s_*Trampoline` static readonly fields.
@dkroenke dkroenke self-requested a review April 24, 2026 11:11
@patdhlk
Copy link
Copy Markdown
Contributor Author

patdhlk commented May 5, 2026

@dkroenke do you have time to review this one?

@dkroenke
Copy link
Copy Markdown
Contributor

@dkroenke do you have time to review this one?

@patdhlk Yes sorry for the delay, I was out of office the last days.

Comment thread src/Iceoryx2/Native/CallbackContext.cs Outdated
@@ -0,0 +1,80 @@
// Copyright (c) 2025 Contributors to the Eclipse Foundation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 2026, same for the other newly created source files

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — bumped to 2026 across the six newly added files (CallbackContext.cs, FfiAssert.cs, NativeStructSizesTests.cs, WaitSetConcurrentTests.cs, CallbackContextTests.cs, ServiceDiscoveryConcurrentTests.cs) in 7bdd9fd.

Comment thread CHANGELOG.md
Comment on lines 19 to 24
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really fits to the issue description? For a better readability, it would be better to split the text into separate bullet points.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into three per-fix bullets in 3f2d914.

On the issue-fit question: #12 is about FFI testing in general and explicitly calls out two of the bugs ("Callback testing is limited and not thread-safe — Service discovery uses thread-local state (_tempServiceList) which is error-prone" → covered by the Node.List + WaitSet fixes; and "Union marshaling requires manual workarounds" → the SERVICE_ID_LENGTH offset fix is the same code path), so the bugfixes are in scope for the issue. Happy to move them under a separate follow-up issue if you'd prefer cleaner accounting.

Comment thread src/Iceoryx2/Node.cs
Comment on lines -71 to -73
// Read the basic fields directly from memory without creating the full struct
// to avoid issues with the union in iox2_static_config_t

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these comments removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the offset comments did real work explaining the manual marshal. Restored in 708cc02.

internal const int IOX2_NODE_NAME_LENGTH = 128;
internal const int IOX2_SERVICE_NAME_LENGTH = 255;
internal const int IOX2_SERVICE_ID_LENGTH = 32;
internal const int IOX2_SERVICE_ID_LENGTH = 64;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question, is there a possibility to read or import the C header to take the values directly from the source? That could possibly reduce the maintenance burden instead of duplicating the values.

Copy link
Copy Markdown
Contributor Author

@patdhlk patdhlk May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — and that is exactly the same maintenance pain that motivated NativeStructSizesTests (the storage-size tripwire). A few directions, ranked by realism:

  1. ClangSharp / CppSharp source-gen. Run cbindgen → C header → ClangSharp at build time and emit Iox2NativeMethods.Generated.cs. This is the proper fix but adds an LLVM/libclang build dependency and a generator project. Probably needs its own issue.
  2. A cbindgen --emit-cmake-config-style sidecar. Have the FFI build also emit a small .json of constants + struct sizes that a tiny C# T4 / source generator consumes. Lower-weight than ClangSharp but still needs a tool we don't yet have.
  3. Status quo + tripwires. What's in this PR: the manual const mirror plus NativeStructSizesTests to catch drift. Cheap, but only catches what we explicitly bookkeep.

I'd lean toward (1) as a follow-up but don't like it at the moment — happy to open an issue for it. For this PR I think the tripwire approach is the right scope, but let me know if you'd rather hold this one until the generator lands.

Comment thread src/Iceoryx2/Node.cs
catch
{
Console.WriteLine($"Error in service list callback: {ex.Message}");
Console.WriteLine($"Stack trace: {ex.StackTrace}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it help for debugging to have a stack trace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the diff between callback layers — the trampolines in WaitSet intentionally swallow everything because they wrap arbitrary user code at the FFI boundary (a user-thrown NullReferenceException must not unwind into native iceoryx2). But ServiceListCallback only runs our own marshalling code; if it throws, that is a binding bug and silencing the stack trace makes it un-debuggable.

Restored the diagnostic logging in 493c2de, this time on Console.Error so it doesn't pollute the stdout-driven console logger contract. Will return STOP on failure as before.

/// </summary>
public class NativeStructSizesTests
{
private const int ServiceBuilderStorageSize = 9104; // iceoryx2.h iox2_service_builder_storage_t
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to share the magic numbers from src/Iceoryx2/Native/Iox2NativeMethods.cs with this file somehow? That would make it less error-prone in case the values are changing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd push back on this one — sharing erases the tripwire that the test is supposed to be. Marshal.SizeOf<T> reads the StructLayout(Size = ...) attribute, so if both the attribute and the test reference the same shared const, the assert collapses to const == const and trivially passes.

The double-entry duplication is load-bearing: the failure mode it catches is exactly "someone updated the C# binding's Size to track new cbindgen output but forgot the test constant (or vice versa)". Killing the duplication kills the only thing forcing the maintainer to touch both files.

In the long run the right fix is generating both sides from the C header (see the thread on IOX2_SERVICE_ID_LENGTH). Until then I'd keep the duplication, but I added an inline note at the literals (eebaffe) so the next reader doesn't have the same "why aren't these shared?" reaction. Let me know if you'd rather I go the other way.

patdhlk added 5 commits May 19, 2026 22:55
Files added in this PR were authored in 2026; reviewer asked the
headers be dated accordingly. Existing repo files keep 2025 since
that reflects their creation year.
Reviewer asked for better readability. The merged paragraph hid the
three distinct fixes (Node.List race, WaitSet cross-call overwrite,
SERVICE_ID_LENGTH header mismatch); separate bullets each linking to
the same issue make the scope of eclipse-iceoryx#12 easier to scan.
The cleanup commit (bf33601) dropped the inline comments that explained
why ServiceListCallback walks the iox2_static_config_t payload by hand
(union avoidance) and what each Marshal.Copy offset corresponds to.
Reviewer flagged the loss; restoring them so future readers do not have
to reverse-engineer the offset arithmetic.
Reviewer asked why the silent catch dropped the previous diagnostic
output. The WaitSet trampolines stay silent because they wrap arbitrary
user callbacks at the FFI boundary, but ServiceListCallback runs only
our own marshalling code — if it throws, that is a binding bug and the
stack trace is exactly what a maintainer needs. Routing through stderr
keeps it out of the normal stdout-driven console logger contract.
…uplicated

Reviewer suggested sharing the storage-size literals between
Iox2NativeMethods.cs and this test file. Doing so would erase the
tripwire: Marshal.SizeOf<T> reads the StructLayout(Size = ...) attribute,
so if both sides derive from the same const, the assert collapses to
const-equals-const and trivially passes. The double-entry bookkeeping is
the point — a maintainer who updates one side but forgets the other
gets a failing test. Adding an in-line note so the design intent is
visible at the literals, not buried in the class docstring.
@patdhlk patdhlk requested a review from dkroenke May 19, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Testing of C# <-> Rust FFI

2 participants