Skip to content

Fix settlement hang#230

Merged
luckysori merged 2 commits into
masterfrom
fix/settlement-hang
May 25, 2026
Merged

Fix settlement hang#230
luckysori merged 2 commits into
masterfrom
fix/settlement-hang

Conversation

@luckysori

@luckysori luckysori commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Bug Fixes
    • Batch operations now include a timeout mechanism to prevent indefinite waiting on network events. If no event is received within the configured timeout duration, an error is returned.

Review Change Stack

@luckysori luckysori requested a review from bonomat May 25, 2026 05:23
@luckysori luckysori self-assigned this May 25, 2026
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c819b52-2ccb-4a26-b219-fc5aa52781c6

📥 Commits

Reviewing files that changed from the base of the PR and between 241e229 and 6a28c20.

📒 Files selected for processing (2)
  • ark-client/src/batch.rs
  • e2e-tests/tests/e2e_assets.rs

Walkthrough

This PR adds timeout enforcement to batch event-processing loops in the Ark client and adjusts an e2e test to advance the blockchain. Both settle_delegate and join_next_batch event loops now wrap stream operations with timeout_op to prevent indefinite blocking, and the test adds an explicit mining step to ensure proper state advancement.

Changes

Batch timeout enforcement

Layer / File(s) Summary
Batch event stream timeouts
ark-client/src/batch.rs
settle_delegate and join_next_batch event-processing loops wrap stream.next().await with timeout_op(self.inner.timeout, ...), adding context "timed out waiting for batch event" when the configured timeout expires, replacing indefinite blocking behavior.
E2E test blockchain advancement
e2e-tests/tests/e2e_assets.rs
The e2e_assets test calls nigiri.mine(1).await after funding Alice to advance the blockchain by one block before continuing asset operations.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix settlement hang' directly addresses the main objective of the PR, which is to fix a settlement timeout/hang issue by adding timeout logic to batch event loops.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/settlement-hang

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@arkanaai arkanaai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Fix settlement hang

Verdict: APPROVE — clean, correct, minimal fix. Requesting human sign-off below because this is settlement-path code.

What the PR does

Two changes:

  1. ark-client/src/batch.rs (lines ~646, ~1392): Wraps the bare stream.next().await calls in both settle_delegate and the main settle/batch event loops with timeout_op(self.inner.timeout, stream.next()). Previously, if the server stopped sending events (crash, network partition, bug), the client would hang forever on stream.next().await. Now it fails with a clear timeout error.

  2. e2e-tests/tests/e2e_assets.rs (line 32): Adds nigiri.mine(1).await after alice.settle() so the commitment transaction actually confirms before the test checks the offchain balance. Without this, the test could hang waiting for confirmation that never comes in regtest.

Analysis

Type safety ✅timeout_op returns Result<Option<Result<StreamEvent, ...>>>. The .context(...) + ? unwraps the outer Result, leaving Option<Result<...>> for the existing match arms (Some(Ok(...)), Some(Err(...)), None). No match arm changes needed. Correct.

Both paths covered ✅ — The fix is applied symmetrically to both settle_delegate (delegated settlement, ~line 646) and the main settlement path (~line 1392). These are the only two stream.next().await calls in batch.rs.

Per-iteration timeout is correct ✅ — Each loop iteration gets a fresh timeout. The server sends StreamEvent::Heartbeat events, so legitimate long-running rounds (many participants, slow nonce aggregation) won't spuriously timeout as long as heartbeats arrive within the configured timeout window (typically 30s per the doc examples).

No API/type changes ✅ — No public API surface changes. No cross-repo impact. timeout_op is already used extensively throughout batch.rs for other operations (intent registration, nonce submission, etc.).

e2e test fix ✅nigiri.mine(1) is standard practice in this test suite for confirming on-chain transactions in regtest. The existing sleep(2s) alone was insufficient without a mined block.

Other stream.next() sites not in this PR

  • boltz.rs has 4 stream.next().await calls for swap status streams — different concern, out of scope.
  • ark-rest/src/client.rs has 2 calls for HTTP byte stream parsing — different layer.
  • e2e-tests/ and ark-client-sample/ — test/sample code, not production.

None of these need this fix.

No issues found

No protocol correctness concerns, no race conditions, no missing error handling. The fix is straightforward and well-scoped.


⚠️ PROTOCOL-CRITICAL FLAG: This PR modifies the settlement event loop — part of the round lifecycle / batch signing flow. Per policy, requesting human review even though the code looks correct. A bug in this path could cause clients to exit the settlement flow prematurely (e.g., if timeout is too aggressive), potentially leaving VTXOs in a limbo state. The current implementation looks safe (heartbeats prevent spurious timeouts), but a human should verify the timeout value is appropriate for production workloads.

🤖 Reviewed by Arkana

@luckysori luckysori merged commit 1d77842 into master May 25, 2026
25 checks passed
@luckysori luckysori deleted the fix/settlement-hang branch May 25, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant