-
Notifications
You must be signed in to change notification settings - Fork 4
refactor: Track 4-tuples instead of only the remote per path. #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/quinn/pr/264/docs/iroh_quinn/ Last updated: 2025-12-23T11:22:18Z |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #264 +/- ##
========================================
Coverage 76.68% 76.69%
========================================
Files 83 83
Lines 23376 23494 +118
========================================
+ Hits 17927 18019 +92
- Misses 5449 5475 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
aea85bb to
84fe7f1
Compare
ddef817 to
4f4b8b7
Compare
And add a regression test from proptests
divagant-martian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very partial review so that this can advance a bit. I'm a fan of the direction, but I can't live with the naming of addresses. Left suggestions
quinn-proto/src/connection/mod.rs
Outdated
| path.path_responses.push(number, challenge.0, remote); | ||
| if remote == path.remote { | ||
| path.path_responses.push(number, challenge.0, addresses); | ||
| if addresses == path.addresses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I think this is correct, but a bit tricky. Random comment here that simply means I need to check a bit more this for me to feel confident about this, given what we do track
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting partial review because I don't know how distracting iroh-live is going to be :)
| let addrs = FourTuple { | ||
| remote: addr, | ||
| local_ip: None, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to repeat the TODO
| /// supported platforms when using [`quinn_udp`](udp) for I/O, which is the default. | ||
| /// | ||
| /// Will panic if called after `poll` has returned `Ready`. | ||
| pub fn local_ip(&self) -> Option<IpAddr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is removed because we haven't necessarily sent a datagram yet in this state and so we don't yet know the local IP? Or is there some other reason? What did this used to return?
| .iter() | ||
| .filter_map(|id| state.inner.network_path(*id).ok()) | ||
| .next() | ||
| .unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be left as an Option and then use .flatten()?
| /// 3. It immediately closes path 0 afterwards. | ||
| /// | ||
| /// At this point, the server is already fully checked out and not responding anymore. | ||
| /// The client however thinks the connection is still ongoing and continues sending (that's fine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this shows a bug in behaviour already:
- The server initiated an immediate close. It is now in the closing state.
- This means the server should respond to any incoming packet that it can attribute to the connection.
- The client opens a new path on the same 4-tuple.
- To do so it must send a packet to the same server addr.
- This will use a CID issued by the server for the same connection, but different path ID.
- Thus the server can attribute this to the same connection.
- And the server should respond with CONNECTION_CLOSE.
- Now this CONNECTION_CLOSE is probably still sent on the path that the client already abandoned. That's fine!
- (You could make the server still open the path, but it shouldn't be needed really.)
- The client receives the CONNECTION_CLOSE frame on the abandoned path.
- Because 3*PTO for this path has not yet expired, it SHOULD still accept this packet.
- Now the client MUST respond with a CONNECTION_CLOSE to the server.
- It only has the new path available, so sends it on this path.
- (I'm still tempted to send this on all PacketNumberSpaces but that's for another day.)
- Now the client is in the draining state.
- The server will receive this CONNECTION_CLOSE.
- It can even process it! But that doesn't even change anything.
- Now the server is in the draining state.
- Both client and server will be drained at some point now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the server should respond with CONNECTION_CLOSE.
Yeah this didn't happen. This might be another bug.
|
I've adjusted the PR description finally. |
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great! Thanks for figuring all this out and making it look entirely reasonable. I think all my comments are just nitpicking to LGTM!
| debug!( | ||
| "sending stateless reset for {} to {}", | ||
| dst_cid, addresses.remote | ||
| dst_cid, network_path.remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally nitpicking so feel free to ignore. But generally I try to migrate using fields for logging, e.g.:
debug!(%dst_cid, %network_path.remote, "sending stateless reset");
Possibly Quinn used to use log rather then tracing a long time ago, and that's why this style still exists.
| fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
| f.debug_struct("Incoming") | ||
| .field("addresses", &self.addresses) | ||
| .field("network_path", &self.network_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should probably start migrating to derive_more now. but feel fee to not do that either.
| } | ||
|
|
||
| /// Check if the 4-tuple path (as in RFC9000 Path, not multipath path) had already been validated. | ||
| fn find_open_path_on_network_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_validated_network_path?
Remember the bikeshedding? We don't use "open". Technically we're looking for InUse paths here, but validated is the property you're really after here according to the doc comment.
| }) | ||
| // TODO(@divma): we might want to ensure the path has been recently active to consider the | ||
| // address validated | ||
| // matheus23: Perhaps looking at !self.abandoned_paths.contains(path_id) is enough, given keep-alives? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think divma's comment here is more accurate. Even if you recently abandoned a path you can still consider it validated.
| fn find_open_path_on_network_path( | ||
| &self, | ||
| network_path: FourTuple, | ||
| ) -> Option<(&PathId, &PathState)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the return value slightly misleading, since you could have multiple PathIds that match the 4-tuple. I'd probably be fine if the doc comment talks a little bit about that.
| ) -> &mut PathData { | ||
| let valid_path = self.find_open_path_on_network_path(network_path); | ||
| let validated = valid_path.is_some(); | ||
| let initial_rtt = valid_path.map(|(_, state)| state.data.rtt.conservative()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let initial_rtt = valid_path.map(|(_, state)| state.data.rtt.conservative()); | |
| let initial_rtt = valid_path.map(|(_, path)| path.data.rtt.conservative()); |
This is super-nitty. But I've been keeping to this convention and thus expect to see that. It reads as "path data".
| /// Updates the network path for `path_id` and returns false, or returns true if a packet | ||
| /// coming in for this `path_id` over given `network_path` should be discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, try and keep this to single line header. I'll enable the lint for that sometime soon, I promise.
| // https://www.ietf.org/archive/id/draft-ietf-quic-multipath-18.html#name-using-multiple-paths-on-the | ||
| // > Client receives the packet, recognizes a path migration, updates the source address of path 2 to 192.0.2.1. | ||
| if let Some(local_ip) = network_path.local_ip { | ||
| // If we already had a local_ip, but it changed, then we need to re-trigger path validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing, because the if statement following it only logs something. I think it's more part of the comments after the if statement?
| trace!(?remote, prev = ?self.path_data(path_id).remote, "server migrated to new remote"); | ||
| self.path_data_mut(path_id).remote = remote; | ||
| self.qlog.emit_tuple_assigned(path_id, remote, now); | ||
| trace!(%network_path, prev = ?self.path_data(path_id).network_path, "server migrated to new remote"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both fields logged are the same type but use a different formatter, % and ? respectively. Should they not be both %?
Description
The main change of this PR is removing
PathData::remotein favor ofPathData::addresseswhich is of typeFourTupleinstead ofSocketAddr.FourTupleis pretty much this struct:(A follow-up should change
local_ipto be anOption<SocketAddr>instead ofIpAddr.)We then track path validation by full 4-tuple instead of by remote address. This is more correct, as seen in the regression test in #258.
Breaking Changes
quinn_proto::DatagramEventInnernow stores aaddresses: FourTupleinstead of aremote: SocketAddr.quinn_proto::Endpoint::handlenow expects aFourTupleinstead ofSocketAddr.quinn_proto::Connection::open_pathandopen_path_ensurenow take aFourTupleinstead ofSocketAddr.quinn_proto::Connection::local_ipwas removed. Look at the paths instead.(A follow-up should also update the quinn API to use full
FourTuples to make it possible to short-circuit path validation.)Notes & open questions
This PR makes a bunch of situations "detectable" that were previously going unnoticed.
E.g. it's now possible to detect the situation where suddenly datagrams are coming in on a different interface on the Endpoint than they were before.
Before tracking 4-tuples, we only know the remote address for datagrams coming in, not the local IP. This means that only the server side would initiate migration when the client changed its "remote address" from the perspective of the server.
Now, the server side can also detect when the client suddenly sends to a different interface. In that case, I've opted to ignore the datagrams coming in.
However, when the client side migrates, it's also possible it chooses a different interface to send from (because e.g. the client migrated from WiFi to cellular). So on the client side, I allow the local IP to just change, and I don't re-validate the path. This is required to make client migration work, and it matches what was happening on the client side before this PR.
This distinction is controlled via the
local_ip_may_migratevariable in the code.Another thing that's now distinguishable is two paths that have the same remote address, but use a different interface to send from.
Previously, e.g.
open_path_ensurewould just ignore the second attempt of opening a path to the same remote. Now, we could technically distinguish these two paths.However, I've opted to make the behavior of having a
local_ipset toNonewhen you callopen_path_ensureto mean that you don't want to open a path to the remote, if another path already exists. This kind of assumes that your OS is likely to choose the same interface anyways (and that the previous path's local IP is what the OS would choose). That's obviously not always the case, but it keeps the behavior the same compared to before we tracked 4-tuples, and makes it behave well for our intents and purposes.