-
Notifications
You must be signed in to change notification settings - Fork 168
Conversation
change signals to work with.
I'm experimenting with this, using a file about 590K (17k short play lines) containing a few things from Shakespeare. I started progressively deleting chunks of this file. After a few deletions (probably spanning 5k lines or so each?), I get a server error like:
That's a frame size of 32MB. That seems a bit large considering my original file size was around 0.5MB. Is that expected? |
Good question! I'm not really sure what the expected over head should be for the CRDT. But each character gets a unique ID, which appears to be a string of length 8. So that's almost an order of magnitude of overhead. Though your example has closer to two orders of magnitude for a UTF-8 encoded file. I do wonder if there is a bug in the websocket layer here -- sending patches should require less storage than the overall file. We will definitely need to think about about some of these scalability issues:
|
I played with it a bit more. I applied the following patch: diff --git a/examples/example-datastore/src/server.ts b/examples/example-datastore/src/server.ts
index 557a83c6..96d92ec1 100644
--- a/examples/example-datastore/src/server.ts
+++ b/examples/example-datastore/src/server.ts
@@ -162,7 +162,7 @@ wsServer.on('request', request => {
return;
}
let data = JSON.parse(message.utf8Data!) as WSAdapterMessages.IMessage;
- console.debug(`Received message of type: ${data.msgType}`);
+ console.debug(`Received message of type: ${data.msgType}; ${Buffer.byteLength(message.utf8Data!).toLocaleString()} bytes`);
let reply: WSAdapterMessages.IReplyMessage;
switch (data.msgType) {
case 'storeid-request':
@@ -202,8 +202,10 @@ wsServer.on('request', request => {
default:
return;
}
- console.debug(`Sending reply: ${reply.msgType}`);
- connection.sendUTF(JSON.stringify(reply));
+
+ let replyString = JSON.stringify(reply);
+ console.debug(`Sending reply: ${reply.msgType}; ${Buffer.byteLength(replyString).toLocaleString()} bytes`);
+ connection.sendUTF(replyString);
});
// Handle a close event from a collaborator. Then I put a 30k file in my paste buffer (good ol' shakespeare :). I pasted it into the document a number of times, and you can see the sizes of the patches kept increasing quite a bit each time I pasted the same 30k string. In the middle, I added one character (that's the several hundred byte message), then kept pasting. Then I deleted about half the file, then deleted the entire rest of the file. I've annotated the log below with // comments.
These message sizes are really concerning. |
You're right, that does seem excessive (and inconsistent!). Looking into it... |
If you're constantly pasting large text at the end of the file, I would expect the id overhead to continue to increase as you continue to create ids with larger dimensionality. Each dimension in an id has 48bits (which is a lot) but it's not densely populated. It's not sparsely populated either, so these patch sizes still look large to me. |
@jasongrout When I perform a similar operation to you (pasting ~40k file repeatedly), I don't see nearly the increase in message size (though the overhead is still large, about 40x!)
|
Outside of what may be causing this, there's certainly room for improvement wrt to id spans to handle large pastes, but we can address that later. |
Also, I added a readme in this example directory: # Phosphor Datastore example
## Build
Compile with `yarn run build:examples` in the Phosphor repo root directory.
## Run
Start the server with `node ./build/server.js`
Go to the address `http://localhost:8000` (or whatever port the server prints out that it is listening on). |
I was pasting text in random places inside the file. Ian, can you try picking random places in the file to paste? |
@ian-r-rose that's about more like what I would expect. You'll have at minimum 16bytes of overhead per character. (until we implement id spans) |
Ooh, @jasongrout I can reproduce what you see by pasting in the middle of the file, as you suggested:
|
I quick sanity check would be to log the average length of the ids generated for the file. Each character in the id string consumes 16bits (on chrome at least). |
Yes, by repeatedly pasting long blocks of text internally, it's not hard to generate some very long average character lengths (~50-100 characters) |
It sounds like we can do a lot to compress patch messages when you have ranges of text, which helps memory use in the browser as well as network bandwidth (I think at one point, the debugger stopped and said I was about to hit an out of memory error in applying a patch). Newbie question: once the ids reach 50-100 bits, we have to deal with those large id sizes at least in that part of the file forever, right? No re-indexing? |
It's not about compressing the patch messages, it's about compressing the ids into ranges. It's not exactly straightforward to implement, which is why I haven't done it yet. There's a deterministic algorithm to apply to ensure that the ranges can be split simultaneously by multiple users and still be merged out of order. |
And you mean 50-100 characters, not bits, right? A single id is at minimum 16 bytes (128bits): https://github.com/phosphorjs/phosphor/blob/master/packages/datastore/src/utilities.ts#L54 |
@jasongrout and I'm curious, have you run the same test on SMC? |
I think the issue exposed here turned out to not really being related to the code of this PR. If so, let's leave this thread to discussing the code in the PR, and continue the load testing discussion here: #411 |
Similar, but not exactly. It wasn't an issue, IIRC. I'll run a similar test and report back. On Google Docs, again, similar but not exactly, but IIRC, the patches were around 700k no matter what for pastes. |
Superseded by #425 |
Needs a bit of cleanup, but this is a functional app with collaborative text editing, based on an earlier version from @vidartf