Skip to content

Conversation

@matheus23
Copy link
Member

@matheus23 matheus23 commented Dec 16, 2025

Description

The main change of this PR is removing PathData::remote in favor of PathData::addresses which is of type FourTuple instead of SocketAddr.

FourTuple is pretty much this struct:

struct FourTuple {
    remote: SocketAddr,
    local_ip: Option<IpAddr>,
}

(A follow-up should change local_ip to be an Option<SocketAddr> instead of IpAddr.)

We then track path validation by full 4-tuple instead of by remote address. This is more correct, as seen in the regression test in #258.

Breaking Changes

  • quinn_proto::DatagramEventInner now stores a addresses: FourTuple instead of a remote: SocketAddr.
  • quinn_proto::Endpoint::handle now expects a FourTuple instead of SocketAddr.
  • quinn_proto::Connection::open_path and open_path_ensure now take a FourTuple instead of SocketAddr.
  • quinn_proto::Connection::local_ip was removed. Look at the paths instead.

(A follow-up should also update the quinn API to use full FourTuples to make it possible to short-circuit path validation.)

Notes & open questions

This PR makes a bunch of situations "detectable" that were previously going unnoticed.
E.g. it's now possible to detect the situation where suddenly datagrams are coming in on a different interface on the Endpoint than they were before.
Before tracking 4-tuples, we only know the remote address for datagrams coming in, not the local IP. This means that only the server side would initiate migration when the client changed its "remote address" from the perspective of the server.
Now, the server side can also detect when the client suddenly sends to a different interface. In that case, I've opted to ignore the datagrams coming in.
However, when the client side migrates, it's also possible it chooses a different interface to send from (because e.g. the client migrated from WiFi to cellular). So on the client side, I allow the local IP to just change, and I don't re-validate the path. This is required to make client migration work, and it matches what was happening on the client side before this PR.
This distinction is controlled via the local_ip_may_migrate variable in the code.

Another thing that's now distinguishable is two paths that have the same remote address, but use a different interface to send from.
Previously, e.g. open_path_ensure would just ignore the second attempt of opening a path to the same remote. Now, we could technically distinguish these two paths.
However, I've opted to make the behavior of having a local_ip set to None when you call open_path_ensure to mean that you don't want to open a path to the remote, if another path already exists. This kind of assumes that your OS is likely to choose the same interface anyways (and that the previous path's local IP is what the OS would choose). That's obviously not always the case, but it keeps the behavior the same compared to before we tracked 4-tuples, and makes it behave well for our intents and purposes.

@matheus23 matheus23 self-assigned this Dec 16, 2025
@github-actions
Copy link

github-actions bot commented Dec 16, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/quinn/pr/264/docs/iroh_quinn/

Last updated: 2025-12-23T11:22:18Z

@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 75.71885% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.69%. Comparing base (226ff0d) to head (fff8ad6).

Files with missing lines Patch % Lines
quinn-proto/src/connection/mod.rs 80.45% 34 Missing ⚠️
quinn/src/connection.rs 28.00% 18 Missing ⚠️
quinn-proto/src/connection/qlog.rs 7.69% 12 Missing ⚠️
quinn-proto/src/lib.rs 80.00% 5 Missing ⚠️
quinn-proto/src/connection/paths.rs 83.33% 4 Missing ⚠️
quinn-proto/src/endpoint.rs 95.65% 2 Missing ⚠️
quinn/src/path.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #264    +/-   ##
========================================
  Coverage   76.68%   76.69%            
========================================
  Files          83       83            
  Lines       23376    23494   +118     
========================================
+ Hits        17927    18019    +92     
- Misses       5449     5475    +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@n0bot n0bot bot added this to iroh Dec 16, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Dec 16, 2025
Base automatically changed from matheus23/fix-path-validation to main December 16, 2025 14:53
@matheus23 matheus23 marked this pull request as ready for review December 18, 2025 15:47
Copy link
Collaborator

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very partial review so that this can advance a bit. I'm a fan of the direction, but I can't live with the naming of addresses. Left suggestions

path.path_responses.push(number, challenge.0, remote);
if remote == path.remote {
path.path_responses.push(number, challenge.0, addresses);
if addresses == path.addresses {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think this is correct, but a bit tricky. Random comment here that simply means I need to check a bit more this for me to feel confident about this, given what we do track

Copy link
Collaborator

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting partial review because I don't know how distracting iroh-live is going to be :)

let addrs = FourTuple {
remote: addr,
local_ip: None,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to repeat the TODO

/// supported platforms when using [`quinn_udp`](udp) for I/O, which is the default.
///
/// Will panic if called after `poll` has returned `Ready`.
pub fn local_ip(&self) -> Option<IpAddr> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is removed because we haven't necessarily sent a datagram yet in this state and so we don't yet know the local IP? Or is there some other reason? What did this used to return?

.iter()
.filter_map(|id| state.inner.network_path(*id).ok())
.next()
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be left as an Option and then use .flatten()?

Comment on lines +522 to +525
/// 3. It immediately closes path 0 afterwards.
///
/// At this point, the server is already fully checked out and not responding anymore.
/// The client however thinks the connection is still ongoing and continues sending (that's fine).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this shows a bug in behaviour already:

  • The server initiated an immediate close. It is now in the closing state.
    • This means the server should respond to any incoming packet that it can attribute to the connection.
  • The client opens a new path on the same 4-tuple.
    • To do so it must send a packet to the same server addr.
    • This will use a CID issued by the server for the same connection, but different path ID.
    • Thus the server can attribute this to the same connection.
    • And the server should respond with CONNECTION_CLOSE.
    • Now this CONNECTION_CLOSE is probably still sent on the path that the client already abandoned. That's fine!
      • (You could make the server still open the path, but it shouldn't be needed really.)
  • The client receives the CONNECTION_CLOSE frame on the abandoned path.
    • Because 3*PTO for this path has not yet expired, it SHOULD still accept this packet.
  • Now the client MUST respond with a CONNECTION_CLOSE to the server.
    • It only has the new path available, so sends it on this path.
    • (I'm still tempted to send this on all PacketNumberSpaces but that's for another day.)
    • Now the client is in the draining state.
  • The server will receive this CONNECTION_CLOSE.
    • It can even process it! But that doesn't even change anything.
    • Now the server is in the draining state.
  • Both client and server will be drained at some point now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the server should respond with CONNECTION_CLOSE.

Yeah this didn't happen. This might be another bug.

@matheus23
Copy link
Member Author

I've adjusted the PR description finally.

Copy link
Collaborator

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! Thanks for figuring all this out and making it look entirely reasonable. I think all my comments are just nitpicking to LGTM!

debug!(
"sending stateless reset for {} to {}",
dst_cid, addresses.remote
dst_cid, network_path.remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally nitpicking so feel free to ignore. But generally I try to migrate using fields for logging, e.g.:
debug!(%dst_cid, %network_path.remote, "sending stateless reset");

Possibly Quinn used to use log rather then tracing a long time ago, and that's why this style still exists.

fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_struct("Incoming")
.field("addresses", &self.addresses)
.field("network_path", &self.network_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should probably start migrating to derive_more now. but feel fee to not do that either.

}

/// Check if the 4-tuple path (as in RFC9000 Path, not multipath path) had already been validated.
fn find_open_path_on_network_path(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_validated_network_path?

Remember the bikeshedding? We don't use "open". Technically we're looking for InUse paths here, but validated is the property you're really after here according to the doc comment.

})
// TODO(@divma): we might want to ensure the path has been recently active to consider the
// address validated
// matheus23: Perhaps looking at !self.abandoned_paths.contains(path_id) is enough, given keep-alives?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think divma's comment here is more accurate. Even if you recently abandoned a path you can still consider it validated.

fn find_open_path_on_network_path(
&self,
network_path: FourTuple,
) -> Option<(&PathId, &PathState)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the return value slightly misleading, since you could have multiple PathIds that match the 4-tuple. I'd probably be fine if the doc comment talks a little bit about that.

) -> &mut PathData {
let valid_path = self.find_open_path_on_network_path(network_path);
let validated = valid_path.is_some();
let initial_rtt = valid_path.map(|(_, state)| state.data.rtt.conservative());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let initial_rtt = valid_path.map(|(_, state)| state.data.rtt.conservative());
let initial_rtt = valid_path.map(|(_, path)| path.data.rtt.conservative());

This is super-nitty. But I've been keeping to this convention and thus expect to see that. It reads as "path data".

Comment on lines +1912 to +1913
/// Updates the network path for `path_id` and returns false, or returns true if a packet
/// coming in for this `path_id` over given `network_path` should be discarded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, try and keep this to single line header. I'll enable the lint for that sometime soon, I promise.

// https://www.ietf.org/archive/id/draft-ietf-quic-multipath-18.html#name-using-multiple-paths-on-the
// > Client receives the packet, recognizes a path migration, updates the source address of path 2 to 192.0.2.1.
if let Some(local_ip) = network_path.local_ip {
// If we already had a local_ip, but it changed, then we need to re-trigger path validation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing, because the if statement following it only logs something. I think it's more part of the comments after the if statement?

trace!(?remote, prev = ?self.path_data(path_id).remote, "server migrated to new remote");
self.path_data_mut(path_id).remote = remote;
self.qlog.emit_tuple_assigned(path_id, remote, now);
trace!(%network_path, prev = ?self.path_data(path_id).network_path, "server migrated to new remote");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fields logged are the same type but use a different formatter, % and ? respectively. Should they not be both %?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

5 participants