Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update protocol to match automerge-repo JS 1.0 #67

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alexjg
Copy link
Collaborator

@alexjg alexjg commented Dec 14, 2023

This is a fairly chunky change, the primary reason for which is that the automerge repo JS protocol introduces a request workflow for requesting documents which you don't have, in contrast to the symmetric sync protocol we have been using so far. This requires the addition of two new message types: a "request" and "unavailable" type. The requesting peer sends a request and then the responding peer responds with either a "sync" or "unavailable" message. Some further complexity is introduced by the fact that each peer is expected to only return unavailable once it has queried both it's storage and all the other peers it knows about.

To implement this then there are the following major changes:

  • Modify request_document to return an Option. This is the primary advantage of the new workflow, that you no longer need to use a timeout in the common case when noone has the document, instead everyone will return Unavailable and we can return Ok(None) from request_document
  • Split RepoEvent::NewDoc into RepoEvent::NewDoc and RepoEvent:RequestDoc, so that we can distinguish between the two cases and decide whether to send request messages to our peers or not
  • Make Repo::new_document return a future, this allows us to move the DocumentInfo creation into the repo loop
  • Introduce a Request state machine which is tracked by the Repo to figure out when to send requests to other peers on behalf of incoming requests and when a request is complete.

This brings us up to interop with the current JS implementation, with the exception of ephemeral messages, which I have another patch for which I will submit once this is merged.

This will be a breaking change to the network protocol and would thus require a network wide upgrade. It is possible that we could write a compatibility layer which we could maintain for a while. @issackelly is that something you would need or is a big bang upgrade okay for you?

Context: the automerge-repo JS implementation supports a request
workflow for syncing with a document we don't have. In this workflow the
requesting peer sends a "request" message which is identical to the
current sync message except tagged with a different message type.
Responding peers can then either respond with a sync message or with an
"unavailable" message, which allows the receiver to tell the difference
between a document the other end doesn't have and a document the other
end has but which is empty.

Problem: in the repo loop we have no way of telling the difference
between a request for a new document and announcing a document we have.
This is because both situations are expressed using the
RepoEvent::NewDoc event.

Solution: split `NewDoc` into a `RequestDoc` and `NewDoc` event. This
allows us to send request messages for `RequestDoc` and sync messages
for `NewDoc`.
@alexjg alexjg force-pushed the update-protocol-implement-request branch from 77d267a to 782b2c3 Compare December 14, 2023 12:29
@alexjg alexjg requested a review from gterzian December 14, 2023 12:31
@alexjg alexjg force-pushed the update-protocol-implement-request branch from 782b2c3 to c012dfb Compare December 14, 2023 12:41
Context: when a peer requests a document from us we often don't want to
just check our own storage, we also want to ask all other peers we are
connected to if they have the document. This is the case for e.g. a sync
server which is relaying documents between two peers.

Problem: the current immplementation just checks local storage

Solution: when we receive a request, first send a request out to all the
peers we are connected to and once they have all responded only then
send a response to the requestor.
@alexjg alexjg force-pushed the update-protocol-implement-request branch from c012dfb to 6aa2fd6 Compare December 14, 2023 13:07
Copy link
Collaborator

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I didn't have enough time yet to review in all details...

Good to see many existing TODO's taken care off...

Copy link
Collaborator

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have finally wrapped my head around the logic.

At a very high-level I'm wondering: why separate the new request concept from the existing peer connection and document info? I find the back and forth between these hard to read, and I'm hoping the logic can be consolidated by updating the existing concepts, as opposed to introduce a new and separate one(which is logically not separate because of the back and forth).

Couple of things:

  1. The equivalent ofSelf::fail_request could be just called in the main loop on all requests that are complete. The benefit is that that logic is in one place, as opposed to being called at different points.
  2. I'm wondering if Request could not be replaced by a combination of a awaiting_response_from: HashSet<RepoId> on DocumentInfo, and awaiting_our_response could be inferred from the existence of PeerConnection::PendingAuth(perhaps with additional data to differentiate between a peer who requested the document, like a boolean) when the docstate transitions out of bootstrap state?.
  3. Request::initiate_local seems to me the same as having a document in the bootstrap state without any peers awaiting our response?

In all cases, the goal is to not have to read all the code to understand the logic, instead just looking at the enums and structs should be enough. Currently this is not possible because request is separate from the document info, and the "state" of the algorithm appears shared between the state of the request and the state of the document info.

It would also be beneficial to layer the logic, as opposed to have what seems like special conditions such as in the handling of NetworkEvent::Request, which skips the share auth phase if the document is not syncing.

Layering would look like:

  1. Always first get the share decisions
  2. If sharing is accepted, either start syncing if DocState::Sync, otherwise request from other peers.
  3. As sync messages, or Unavailable events are handled, update the document info and or peer connection and send outgoing messages.

In other words, always follow the same steps, but with different data.

request::Request::new(document_id.clone())
});

if info.state.is_bootstrapping() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this to the DocState::Bootstrap match arm above?

"responding to request with sync as we have the doc"
);
// if we have this document then just start syncing
Self::enqueue_share_decisions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we do this first in all cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants