Skip to content

Full node sync improvements #2176

@Mirko-von-Leipzig

Description

@Mirko-von-Leipzig

Some thoughts I had while refactoring and shuffling code around. These may, or may not, be applicable.

I'm listing them here; maybe we can come up with a cohesive plan to address these.

Proven tip can exceed committed tip

I didn't look too deeply, but I think the proof and block syncs are completely decoupled. This means its possible for the local state to include (and publish) a proof for which it has no block yet.

We should probably process a block before its proof, especially while syncing. And we should verify the proofs before applying them.

This also means our own RPC methods can be desync'd ito tip. The proven tip should always be <= committed tip to make sense.

Restreaming tips

A full node can also re-stream its own blocks to downstream nodes. Currently, the node will set the outgoing streams tip to its own local tip - which is fine if we're at least close to the tip, but pretty wrong if the node is still catching up. An alternative is to set it to the incoming streams tips instead.

Whether this is safe to do is a different question, since we trust the provided tips explicitly.

An alternative is discussed next.

Node health

Full nodes can be load-balanced to provide RPC scaling. However, a node still undergoing syncing should not be made available. I think this is generally done by having the node output unhealthy status? We should configure this, such that its unhealthy if not within some small margin of the subscription tips.

This would be my highest priority.

Status

Node's should also advertise their local chain tip versus the actual chain tip so users can inspect their node's sync state.

Disconnect slow clients

We should disconnect slow subscribers. The open question is how do we determine slow.

An initial suggestion is to classify a slow client as one that is slower than the block speed, since they'll never catch up.

A somewhat basic algorithm is to retain the minimum gap between the client and the chain tip. If the client ever falls N beyond this minimum, they are classified as slow and dropped. Note that this even holds when the client is at the chain tip where the minimum gap would be zero, and if the client ever falls behind by N they are dropped. i.e.

let mut minimum_gap = u32::MAX;
...
let current_gap = self.chain_tip - client.chain_tip;
minimum_gap = minimum_gap.min(current_gap);

if current_gap > minimum_gap + N {
    return Err(tonic::Status::TooSlow);
}

Metadata

Metadata

Assignees

Labels

block-producerRelated to the block producer componentrpcRelated to the RPC component
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions