-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gzip + chunk pool problem #550
Conversation
This commit adds two intermediary wrappers to the gzip/brotli decoding cases, DidReadZero and CompressionRead. The inner, DidReadZero, keeps track of whether the wrapped stream has read to n=0. The wrapped stream will typically be a `PoolReturnRead(ChunkedDecoder)` or `PoolReturnRead(LimitRead)`, both which are fine to read multiple times to 0. The outher CompressionRead will on n=0 unpack the inner DidReadZero and ensure we do read to 0.
This commit adds two intermediary wrappers to the gzip/brotli decoding cases, DidReadZero and PreciseRead. The inner, DidReadZero, keeps track of whether the wrapped stream has read to n=0. The wrapped stream will typically be a `PoolReturnRead(ChunkedDecoder)` or `PoolReturnRead(LimitRead)`, both which are fine to read multiple times to 0. The outher PreciseRead will on n=0 unpack the inner DidReadZero and ensure it reads exactly nothing one more time.
@jsha gentle reping 🤗 |
I will take another look at this soon! Thanks for the ping. |
This isn't quite what I had in mind with #549 (comment). The DidReadZero thing is cool, but we can get away without it, only wrapping another reader on the outside, not on the outside and inside. I wrote up a demo to try and show what I meant here: use std::io::{Read, Error, ErrorKind};
const BUFFER_SIZE: usize = 8000;
pub(crate) struct BrotliReader<R: Read> {
inner: Option<brotli_decompressor::Decompressor<R>>
}
impl<R: Read> BrotliReader<R> {
fn new(inner: R) -> Self {
Self { inner: Some(brotli_decompressor::Decompressor::new(inner, BUFFER_SIZE)) }
}
fn verify_wrapped_reader_done(&mut self) -> std::io::Result<usize> {
let r = match self.inner.take() {
Some(r) => r,
None => return Ok(0),
};
let mut inner = r.into_inner();
match inner.read(&mut [0u8; 1]) {
Ok(0) => Ok(0),
Err(e) => Err(e),
Ok(_) => Err(Error::new(ErrorKind::InvalidData, "extra data after valid Brotli stream")),
}
}
}
impl<R: Read> Read for BrotliReader<R> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let inner = match &mut self.inner {
Some(d) => d,
None => return Ok(0),
};
match inner.read(buf) {
Ok(0) => self.verify_wrapped_reader_done(),
Ok(n) => Ok(n),
r => r,
}
}
}
#[cfg(test)]
mod tests {
use std::io::{Cursor, ErrorKind};
use std::error::Error;
use super::BrotliReader;
#[test]
fn test_no_leftovers() -> Result<(), Box<dyn Error>> {
// Output from this command:
// (brotli -c <<<"hello"; echo goodbye) | xxd -p | sed 's/../\\x&/g'
let compressed_with_extra = b"\x8f\x02\x80\x68\x65\x6c\x6c\x6f\x0a\x03\x67\x6f\x6f\x64\x62\x79\x65\x0a";
let mut reader = BrotliReader::new(Cursor::new(compressed_with_extra));
let output = std::io::read_to_string(&mut reader);
assert!(output.is_err());
assert_eq!(output.err().unwrap().kind(), ErrorKind::InvalidData);
Ok(())
}
} Unfortunately, the test fails! But if you change BUFFER_SIZE to 10 (the size of the "good" Brotli bytes), it passes. This is because of the internal buffering in the Brotli decompressor. With a BUFFER_SIZE of 8000, it has already read the full 18 byte input into the buffer, and then |
What guarantee do we have that the extra read isn't going to sync wait for the next byte from the server? |
Or put another way: I think technically in rust you should only read to zero once. There's no guarantee that multiple reads to 0 will be a good state. We can guarantee that ourselves by knowing exactly what deeper readers are hiding behind that into inner. My solve was trying to ensure we don't rely on such deeper behavior. |
The next layer down is one of:
For all of these, if the response is done, they will return Ok(0), not sync wait. If the response is not done, they should sync wait and when the next byte eventually comes through, the response should error out (due to extra junk at the end). For (3), that sync wait might result in an eventual
This is a good point. Specifically
One solution to this is the equivalent of std::iter::Fuse but for |
Close #549