Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use spawn_blocking directly to read from files #25

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stephank
Copy link
Owner

Code is rough, and not sure if this is a good idea. Just playing around with ideas in #24.

This bypasses tokio::fs and directly uses spawn_blocking inspired by Tokio internals. This allows us to skip Tokio's internal buffering, and read directly into our own BytesMut.

It looks like, at least on macOS, the OS will fill whatever size buffer we give it.

Some relative results on my own machine using the example server with: ab -n 1000 -c 10 http://127.0.0.1:3000/search-index.js (an 975K file), and tweaking our own BUF_SIZE:

before,   8 KB buffer |  244.64 req/s |  238639.96 KB/s
before,  16 KB buffer |  419.66 req/s |  409371.92 KB/s
after,    8 KB buffer |  274.51 req/s |  267781.17 KB/s
after,   16 KB buffer |  477.92 req/s |  466201.27 KB/s
after,   32 KB buffer |  774.12 req/s |  755135.07 KB/s
after,   64 KB buffer | 1143.46 req/s | 1115419.94 KB/s
after,  128 KB buffer | 1395.01 req/s | 1360802.62 KB/s
after,  256 KB buffer | 1437.07 req/s | 1401834.38 KB/s
after,  512 KB buffer | 1604.41 req/s | 1565070.82 KB/s
after, 1024 KB buffer | 1726.24 req/s | 1683912.61 KB/s

These numbers are really shifty, though. But what I do find interesting is that there's an apparent ~13% increase comparing the two runs with 8 KB buffers.

Besides skipping a copy, the other thing I thought might make a difference is the change in open_with_metadata. Through Tokio internals, we previously did separate spawn_blocking calls for the open and stat steps there, but I combined these in this PR. To try eliminate this, I split the new code into two manual spawn_blocking calls and got these numbers:

8 KB buffer |  269.98 req/s |  263361.36 KB/s

So that only accounts for maybe ~1.5% of that 13% gain.

cc @lnicola

@lnicola
Copy link
Contributor

lnicola commented Dec 18, 2019

That code seems (IIRC) pretty similar to what tokio does, I didn't really expect the performance improvement. One workaround suggested in that issue is to use task::run_in_place instead of task::spawn_blocking (or tokio::spawn together with task::run_in_place). Perhaps it's worth a try?

open_with_metadata

Our testing method is a bit different. I used a large file and a single client at a time.

@stephank
Copy link
Owner Author

I guess we should have both benchmarks in code somewhere, but I'll stick to ab for now, because it's what I have. :)

Using block_in_place requires rt-threaded, which totally changes Tokio performance. Performing the same test, starting with current unmodified code in this branch:

  • New baseline for this test:
    224.71 req/s - 219455.55 KB/s
  • With rt-threaded:
    479.57 req/s - 468364.66 KB/s
  • With rt-threaded, and block_in_place in open_with_metadata:
    501.74 req/s - 490019.86 KB/s
  • With rt-threaded, and block_in_place in open_with_metadata and FileBytesStream:
    2546.89 req/s - 2487372.11 KB/s

That seems kind of ridiculous.

I'm not even sure if using block_in_place inside a poll function is safe. This is what I was trying, essentially removing the State enum:

fn poll_next(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<Self::Item>> {
    match block_in_place(|| {
        self.buf.reserve(BUF_SIZE);
        let slice = self.buf.bytes_mut();
        let slice = unsafe {
            std::slice::from_raw_parts_mut(slice.as_mut_ptr() as *mut u8, slice.len())
        };
        (&*self.file).read(slice)
    }) {
        Ok(0) => Poll::Ready(None),
        Ok(size) => {
            unsafe { self.buf.advance_mut(size) };
            let retval = self.buf.split().freeze();
            Poll::Ready(Some(Ok(retval)))
        }
        Err(e) => Poll::Ready(Some(Err(e))),
    }
}

@lnicola
Copy link
Contributor

lnicola commented Dec 22, 2019

Well that's a nice improvement, right? :-). Yeah, pity it won't with with the single-threaded runtime. Maybe we can do nothing and wait for tokio to fix the fs performance.

Otherwise it's safe to use AFAICT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants