-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed, chunked response inhibits connection reuse #549
Comments
@OrIdow6 also thanks for taking the time to write such a detailed report. I suspect the MultiGzDecoder only coincidentally fixes this because it will try additional reads after the first gzip stream is done. In terms of how to fix this: the main problem is that the chunked encoding may have a few more bytes that it hasn't read. Specifically the We also don't want to do any syscall-invoking reads in a Proposal: fork chunked_transfer and make it aware of BufRead. When a read is finished, peek at the next 5 bytes in the buffer (if available). If those bytes are Note this approach is not perfect: it's possible we could land right on a chunk boundary, with no bytes left in the buffer, but if we read more from the network, the next bytes we would read would be Another approach, which would be more perfect: if |
The approach in #550 has a problem: it assumes that the end-user reads the decompressed body until it gets an Also, the #550 approach only fixes things for the compression use case. The same problem exists for any length-aware format even if it's not compressed. Solving this at the chunked_transfer layer solves things more generally, and is also a better use of abstraction. An interesting thing to note in both cases: our current implementation of automatic decompression will not raise an error if the server includes some extra junk after the compressed stream ends. I don't think it's a big deal to fork chunked_transfer for this purpose; it's something we've considered before. And I bet we could get the improvements upstreamed, too! |
Sure. But our that's how repooling currently works: It repools if the user reads to the end. A user might always have some reason to abort reading early. Like having another length aware thing inside (that's ortogogonal to chunked/gzip). The contract is: read to end and ureq repools. For this bug, the problem is that the user really does read to end, and ureq breaks the contract. |
I did some more thinking and reading about this. What should we do if there is extra garbage after a gzip stream ends? I tested and both curl and Go's net/http generate an error. According to https://datatracker.ietf.org/doc/html/rfc9112#section-12.3, gzip is defined in RFC 1952. In section 2.2:
The GzDecoder docs don't really make it clear, but that decoder only decodes a single member. The gzip specification (and therefore HTTP) specify a sequence of members. That's what's implemented by MultiGzDecoder. In other words, MultiGzDecoder is what you want if you actually want to read gzip response bodies, even though its documentation makes it sound like something esoteric. That plays nicely with our intuition that extra garbage after a gzip member should cause an error. Once you've read a full member, the next thing you read must be either EOF or another fully-valid gzip member. So I take back what I said earlier: it's not really a coincidence that MultiGzDecoder repools, because it will keep reading to EOF. So, I propose we switch to MultiGzDecoder, which I think should solve this problem in a simple way. |
By the way, Brotli doesn't have the same convenient property, where streams happen to be terminated by end-of-file. AFAICT the Brotli decoder behaves similarly to how the I think we should error if there is extra garbage. One way to do that: wrap Some reasons to error on extra garbage:
|
results in (RUST_LOG=trace)
As can be seen, though TLS does session resumption, the TCP stream gets closed and reopened between requests. I believe this is caused by the fact that the
GzDecoder
knows when to expect the end of the gzip stream, never tries to read beyond it, and therefore never makes the read that causes theChunkDecoder
to give it Ok(0), which is what thePoolReturnRead
looks for to know when to return theStream
to its pool. I was able to get it to work properly by using aMultiGzDecoder
instead of a plainGzDecoder
, at least in my private, heavily-modified version of ureq. But I don't think any such thing is available for Brotli.The text was updated successfully, but these errors were encountered: