Skip to content

Add configurable timeouts to XLinkConnect and XLinkOpenStream#104

Open
alicespetma-stack wants to merge 2 commits intoluxonis:masterfrom
alicespetma-stack:fix/connect-openstream-timeout
Open

Add configurable timeouts to XLinkConnect and XLinkOpenStream#104
alicespetma-stack wants to merge 2 commits intoluxonis:masterfrom
alicespetma-stack:fix/connect-openstream-timeout

Conversation

@alicespetma-stack
Copy link
Copy Markdown

@alicespetma-stack alicespetma-stack commented Mar 12, 2026

Summary

XLinkConnect() and XLinkOpenStream() pass XLINK_NO_RW_TIMEOUT to DispatcherWaitEventComplete(), which results in a bare sem_wait() that blocks the calling thread indefinitely if the remote device fails to respond after firmware boot. This is the same pattern that PR #5 addressed for data read/write operations, but the connect and stream-open paths were not covered.

Additionally, the timeout path in DispatcherWaitEventComplete used iteration counting (while (timeoutMs--) with usleep(1000)) to approximate timeout duration. This was inaccurate because usleep(1000) can sleep 1-10ms depending on system load, making a "5000ms" timeout anywhere from 5s to 50s in practice.

Changes

Commit 1: Add configurable timeouts to XLinkConnect and XLinkOpenStream

  • XLinkConnectWithTimeout(handler, timeoutMs) — connect with configurable ping timeout
  • XLinkOpenStreamWithTimeout(id, name, size, timeoutMs) — open stream with configurable timeout
  • XLINK_CONNECT_TIMEOUT (5s) and XLINK_OPEN_STREAM_TIMEOUT (5s) default constants
  • XLinkConnect() and XLinkOpenStream() now delegate to the WithTimeout variants using these defaults

Commit 2: Replace iteration-counting timeout with monotonic clock measurement

Replaces the while (timeoutMs--) polling loop with getMonotonicTimestamp() (backed by std::chrono::steady_clock, already in the codebase via XLinkTime.h). This:

  • Gives accurate timeouts regardless of system load or scheduler granularity
  • Remains immune to NTP/system clock jumps (the original reason sem_timedwait was replaced)
  • Resolves the TODO: "This is a temporary solution. TODO: replace this with something more efficient."
  • Related: CLOCK_REALTIME should be CLOCK_MONOTONIC #86

Motivation

When a device firmware fails to initialize its XLink dispatcher after boot (e.g. due to a firmware hang or hardware instability), the host-side XLinkConnect() PING handshake blocks forever in DispatcherWaitEventComplete(). Since watchdog threads in depthai-core are created after XLinkConnect and XLinkOpenStream complete, there is no recovery mechanism — the calling thread is permanently stuck.

With this change, these operations time out after 5 seconds by default. For a responsive device, PING and stream creation complete in well under 100ms, so the 5s default has no impact on normal operation. Callers can pass XLINK_NO_RW_TIMEOUT to the WithTimeout variants to restore the previous infinite-wait behavior if needed.

Files Modified

File Change
include/XLink/XLinkPublicDefines.h Add XLINK_CONNECT_TIMEOUT and XLINK_OPEN_STREAM_TIMEOUT constants
include/XLink/XLink.h Declare XLinkConnectWithTimeout() and XLinkOpenStreamWithTimeout()
src/shared/XLinkDevice.c Implement XLinkConnectWithTimeout(), XLinkConnect() delegates to it
src/shared/XLinkData.c Implement XLinkOpenStreamWithTimeout(), XLinkOpenStream() delegates to it
src/shared/XLinkDispatcher.c Replace iteration-counting with monotonic clock measurement

Backward Compatibility

  • Fully backward compatible: existing callers of XLinkConnect() and XLinkOpenStream() require no changes
  • The only behavioral change is that these operations now time out after 5s instead of waiting forever
  • The WithTimeout API follows the same pattern as the existing XLinkWriteDataWithTimeout / XLinkReadDataWithTimeout / XLinkReadMoveDataWithTimeout

Test Plan

  • Build on Linux and Windows
  • Normal device connection works (PING completes in <100ms, well within 5s timeout)
  • When device firmware is unresponsive, XLinkConnect returns X_LINK_TIMEOUT after 5s instead of hanging
  • depthai-core connect retry loop in XLinkConnection::initDevice() can now retry on timeout
  • Timeout accuracy: verify 5s timeout completes in ~5s regardless of system load

XLinkConnect() and XLinkOpenStream() previously passed XLINK_NO_RW_TIMEOUT
to DispatcherWaitEventComplete(), causing a bare sem_wait() that blocks
forever if the remote device fails to respond after firmware boot.

This adds:
- XLinkConnectWithTimeout(handler, timeoutMs)
- XLinkOpenStreamWithTimeout(id, name, size, timeoutMs)
- XLINK_CONNECT_TIMEOUT (5s) and XLINK_OPEN_STREAM_TIMEOUT (5s) defaults

The existing XLinkConnect() and XLinkOpenStream() now delegate to the
WithTimeout variants using these defaults, matching the pattern established
by XLinkWriteDataWithTimeout/XLinkReadDataWithTimeout.

If the device is responsive, PING and stream creation complete in <100ms.
The 5s default is generous but prevents indefinite hangs when the device
firmware fails to initialize its XLink dispatcher after boot.

Callers can pass XLINK_NO_RW_TIMEOUT to the WithTimeout variants to
restore the previous infinite-wait behavior if needed.
The previous timeout implementation in DispatcherWaitEventComplete counted
loop iterations (each with a 1ms usleep) to approximate the timeout duration.
This was inaccurate because usleep(1000) can sleep 1-10ms depending on
system load and scheduler granularity, making a "5000ms" timeout anywhere
from 5s to 50s in practice.

Replace with getMonotonicTimestamp() (backed by std::chrono::steady_clock)
to measure actual elapsed time. This:
- Gives accurate timeouts regardless of system load
- Remains immune to NTP/system clock jumps (the original reason
  sem_timedwait was replaced with the polling loop)
- Works cross-platform (steady_clock on Windows, CLOCK_MONOTONIC on Linux)

Resolves the TODO comment: "This is a temporary solution. TODO: replace
this with something more efficient."

Related: luxonis#86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant