Datastore example app #410

ian-r-rose · 2019-08-05T18:56:04Z

Needs a bit of cleanup, but this is a functional app with collaborative text editing, based on an earlier version from @vidartf

change signals to work with.

jasongrout · 2019-08-05T22:54:59Z

I'm experimenting with this, using a file about 590K (17k short play lines) containing a few things from Shakespeare. I started progressively deleting chunks of this file. After a few deletions (probably spanning 5k lines or so each?), I get a server error like:

Mon Aug 05 2019 15:52:53 GMT-0700 (Pacific Daylight Time) Store ID 5 disconnected. Reason: 1009: Frame size of 32075084 bytes exceeds maximum accepted frame size

That's a frame size of 32MB. That seems a bit large considering my original file size was around 0.5MB. Is that expected?

ian-r-rose · 2019-08-05T23:35:28Z

Good question! I'm not really sure what the expected over head should be for the CRDT. But each character gets a unique ID, which appears to be a string of length 8. So that's almost an order of magnitude of overhead. Though your example has closer to two orders of magnitude for a UTF-8 encoded file. I do wonder if there is a bug in the websocket layer here -- sending patches should require less storage than the overall file.

We will definitely need to think about about some of these scalability issues:

How do we checkpoint things?
Should we not allow collaborative editing of large files?
What are the limits of the transport layer?

jasongrout · 2019-08-05T23:53:11Z

I played with it a bit more. I applied the following patch:

diff --git a/examples/example-datastore/src/server.ts b/examples/example-datastore/src/server.ts
index 557a83c6..96d92ec1 100644
--- a/examples/example-datastore/src/server.ts
+++ b/examples/example-datastore/src/server.ts
@@ -162,7 +162,7 @@ wsServer.on('request', request => {
       return;
     }
     let data = JSON.parse(message.utf8Data!) as WSAdapterMessages.IMessage;
-    console.debug(`Received message of type: ${data.msgType}`);
+    console.debug(`Received message of type: ${data.msgType}; ${Buffer.byteLength(message.utf8Data!).toLocaleString()} bytes`);
     let reply: WSAdapterMessages.IReplyMessage;
     switch (data.msgType) {
       case 'storeid-request':
@@ -202,8 +202,10 @@ wsServer.on('request', request => {
       default:
         return;
     }
-    console.debug(`Sending reply: ${reply.msgType}`);
-    connection.sendUTF(JSON.stringify(reply));
+
+    let replyString = JSON.stringify(reply);
+    console.debug(`Sending reply: ${reply.msgType}; ${Buffer.byteLength(replyString).toLocaleString()} bytes`);
+    connection.sendUTF(replyString);
   });
 
   // Handle a close event from a collaborator.

Then I put a 30k file in my paste buffer (good ol' shakespeare :). I pasted it into the document a number of times, and you can see the sizes of the patches kept increasing quite a bit each time I pasted the same 30k string. In the middle, I added one character (that's the several hundred byte message), then kept pasting. Then I deleted about half the file, then deleted the entire rest of the file. I've annotated the log below with // comments.

// paste 30k
Received message of type: transaction-broadcast; 1,268,005 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 2,109,170 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 3,267,595 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 4,376,264 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 5,125,654 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 6,509,601 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 7,615,614 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,284,228 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// insert one character
Received message of type: transaction-broadcast; 355 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 4,041,473 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 5,407,557 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 6,243,956 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 7,560,826 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,001,110 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,843,058 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,893,291 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,960,872 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 10,262,576 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// add one character
Received message of type: transaction-broadcast; 644 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 11,588,272 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// Delete about half the file
Received message of type: transaction-broadcast; 71,278,174 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// Delete the rest of the file
Received message of type: transaction-broadcast; 47,076,175 bytes

These message sizes are really concerning.

ian-r-rose · 2019-08-06T00:08:45Z

You're right, that does seem excessive (and inconsistent!). Looking into it...

sccolbert · 2019-08-06T00:17:20Z

If you're constantly pasting large text at the end of the file, I would expect the id overhead to continue to increase as you continue to create ids with larger dimensionality. Each dimension in an id has 48bits (which is a lot) but it's not densely populated. It's not sparsely populated either, so these patch sizes still look large to me.

ian-r-rose · 2019-08-06T00:22:11Z

@jasongrout When I perform a similar operation to you (pasting ~40k file repeatedly), I don't see nearly the increase in message size (though the overhead is still large, about 40x!)

Mon Aug 05 2019 17:19:22 GMT-0700 (Pacific Daylight Time) Connection accepted.
Received message of type: storeid-request; 89 bytes
Sending reply: storeid-reply; 148 bytes
Received message of type: history-request; 89 bytes
Sending reply: history-reply; 166 bytes
Received message of type: transaction-broadcast; 220 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,755,350 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,699,964 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,723,954 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,242 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,180 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,232 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,560,763 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,678 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,437 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,724,305 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,560,685 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,564 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,724,359 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,287 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,104 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,113 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,325 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,166 bytes
Sending reply: transaction-ack; 190 bytes

sccolbert · 2019-08-06T00:22:42Z

Outside of what may be causing this, there's certainly room for improvement wrt to id spans to handle large pastes, but we can address that later.

jasongrout · 2019-08-06T00:31:35Z

Also, I added a readme in this example directory:

# Phosphor Datastore example

## Build

Compile with `yarn run build:examples` in the Phosphor repo root directory.

## Run

Start the server with `node ./build/server.js`

Go to the address `http://localhost:8000` (or whatever port the server prints out that it is listening on).

jasongrout · 2019-08-06T00:32:36Z

If you're constantly pasting large text at the end of the file, I would expect the id overhead to continue to increase as you continue to create ids with larger dimensionality.

I was pasting text in random places inside the file.

Ian, can you try picking random places in the file to paste?

sccolbert · 2019-08-06T00:33:32Z

@ian-r-rose that's about more like what I would expect. You'll have at minimum 16bytes of overhead per character. (until we implement id spans)

ian-r-rose · 2019-08-06T00:41:42Z

Ooh, @jasongrout I can reproduce what you see by pasting in the middle of the file, as you suggested:

^V^[[AReceived message of type: transaction-broadcast; 1,279 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,279 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,755,620 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 3,094,110 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 4,519,486 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 5,741,282 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 5,571,838 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 7,674,794 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 7,686,337 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 9,219,505 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 9,268,659 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 9,234,554 bytes
Sending reply: transaction-ack; 190 bytes

sccolbert · 2019-08-06T00:53:49Z

I quick sanity check would be to log the average length of the ids generated for the file. Each character in the id string consumes 16bits (on chrome at least).

ian-r-rose · 2019-08-06T01:28:09Z

Yes, by repeatedly pasting long blocks of text internally, it's not hard to generate some very long average character lengths (~50-100 characters)

jasongrout · 2019-08-06T01:30:07Z

It sounds like we can do a lot to compress patch messages when you have ranges of text, which helps memory use in the browser as well as network bandwidth (I think at one point, the debugger stopped and said I was about to hit an out of memory error in applying a patch).

Newbie question: once the ids reach 50-100 bits, we have to deal with those large id sizes at least in that part of the file forever, right? No re-indexing?

sccolbert · 2019-08-06T02:08:53Z

It's not about compressing the patch messages, it's about compressing the ids into ranges. It's not exactly straightforward to implement, which is why I haven't done it yet. There's a deterministic algorithm to apply to ensure that the ranges can be split simultaneously by multiple users and still be merged out of order.

sccolbert · 2019-08-06T02:10:14Z

And you mean 50-100 characters, not bits, right? A single id is at minimum 16 bytes (128bits): https://github.com/phosphorjs/phosphor/blob/master/packages/datastore/src/utilities.ts#L54

sccolbert · 2019-08-06T02:12:16Z

@jasongrout and I'm curious, have you run the same test on SMC?

vidartf · 2019-08-06T10:47:42Z

I think the issue exposed here turned out to not really being related to the code of this PR. If so, let's leave this thread to discussing the code in the PR, and continue the load testing discussion here: #411

jasongrout · 2019-08-06T11:54:32Z

@jasongrout and I'm curious, have you run the same test on SMC?

Similar, but not exactly. It wasn't an issue, IIRC. I'll run a similar test and report back.

On Google Docs, again, similar but not exactly, but IIRC, the patches were around 700k no matter what for pastes.

ian-r-rose · 2019-09-06T18:31:42Z

Superseded by #425

vidartf and others added 24 commits August 5, 2019 11:46

datastore example wip

c6a4708

Make functional.

5cbd4db

Messy monaco experiments

2cc3512

Fix initial value.

4978efb

Work on more granular change ops.

0ca51ed

Something may be up with merging updates.

5e469ca

WIP dock

1a27b45

Decouple widgets.

a6afa1a

Cleanup and minor styling.

2149d6d

restructuring.

d5dcebc

Add readonly toggle.

c1b97e0

Move editor widget into its own module.

2a5992c

Adjust layout.

af759b2

Consolidate server.

90bdc55

Cleanup and documentation.

6342e82

const -> let

2ae584b

Remove console.

87dfd49

Work on check change event.

344c187

Handle window resizing.

25e5764

Clean up datastore change signal handling a bit.

871fef6

Use CodeMirror instead of monaco. It's lighter weight and has easier

a628ad7

change signals to work with.

Move change handlers to be class methods.

b2d23da

Use the same server for websocket and http.

f1e762b

Cleanup.

bb886fd

ian-r-rose mentioned this pull request Aug 5, 2019

Real Time Collaboration jupyterlab/jupyterlab#5382

Closed

vidartf mentioned this pull request Aug 6, 2019

Datastore load testing #411

Open

ian-r-rose mentioned this pull request Sep 6, 2019

Datastore example app undo redo #425

Merged

ian-r-rose closed this Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datastore example app #410

Datastore example app #410

ian-r-rose commented Aug 5, 2019

jasongrout commented Aug 5, 2019 •

edited

Loading

ian-r-rose commented Aug 5, 2019

jasongrout commented Aug 5, 2019 •

edited

Loading

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

jasongrout commented Aug 6, 2019

jasongrout commented Aug 6, 2019

sccolbert commented Aug 6, 2019 •

edited

Loading

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

ian-r-rose commented Aug 6, 2019 •

edited

Loading

jasongrout commented Aug 6, 2019

sccolbert commented Aug 6, 2019

sccolbert commented Aug 6, 2019

sccolbert commented Aug 6, 2019

vidartf commented Aug 6, 2019

jasongrout commented Aug 6, 2019

ian-r-rose commented Sep 6, 2019

Datastore example app #410

Datastore example app #410

Conversation

ian-r-rose commented Aug 5, 2019

jasongrout commented Aug 5, 2019 • edited Loading

ian-r-rose commented Aug 5, 2019

jasongrout commented Aug 5, 2019 • edited Loading

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

jasongrout commented Aug 6, 2019

jasongrout commented Aug 6, 2019

sccolbert commented Aug 6, 2019 • edited Loading

ian-r-rose commented Aug 6, 2019

sccolbert commented Aug 6, 2019

ian-r-rose commented Aug 6, 2019 • edited Loading

jasongrout commented Aug 6, 2019

sccolbert commented Aug 6, 2019

sccolbert commented Aug 6, 2019

sccolbert commented Aug 6, 2019

vidartf commented Aug 6, 2019

jasongrout commented Aug 6, 2019

ian-r-rose commented Sep 6, 2019

jasongrout commented Aug 5, 2019 •

edited

Loading

jasongrout commented Aug 5, 2019 •

edited

Loading

sccolbert commented Aug 6, 2019 •

edited

Loading

ian-r-rose commented Aug 6, 2019 •

edited

Loading