Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(HTTP): Retry failed HTTP requests with exponential backoff #1098

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mariocynicys
Copy link
Member

This PR implements exponential backoff retrials for http files.

I am not quite familiar with the code base & how packager works, so I could use some help in clarifying the following:

  • In case of PUT or POST requests, the upload_cache_ would be drained, right? Does this mean we have to keep a copy of it?
  • Are there any other cases we should perform retries in? Other than client timeouts and these specific response code.

Tests yet to be added.
Closes #1095

long res_code = 0;
curl_easy_getinfo(curl_.get(), CURLINFO_RESPONSE_CODE, &res_code);
if (res_code == 408 || res_code == 429 || res_code >= 500) {
// FIXME: Should we clear download_cache_ and upload_cache_ here??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you'll need to ensure that the inside of the loop has no side-effects. I haven't analyzed to see what the caches do here, but it's worth looking into more closely.

You may want to write test cases that verify the correct behavior for POST/PUT with data. In the cmake branch, you'll find HTTP file tests are all enabled and working. (In main, they are all disabled.) You should be able to verify retry behavior with httpbin, and see that the final failed retry still has a response that reflects the data from the POST/PUT.

I recommend you try the cmake branch in a separate clone, without using gclient. We have yet to update the docs for that, but this is what you'll want to try:

# Clone
git clone ...
# Use cmake branch
git checkout -b cmake upstream/cmake
# Get submodules
git submodule init && git submodule update
# Gen build files
cmake -S . -B build/ -DCMAKE_BUILD_TYPE=Debug
# Build
cmake --build build/ --config Debug --parallel
# Test
(cd build && ctest -C Debug -V)

If you have time, you could make a PR against the cmake branch, which is where most development is happening right now as we rewrite the build system and dependencies.

@joeyparrish
Copy link
Member

Please merge with main, which has changed a lot in the recent cmake port. Thanks!

@mariocynicys mariocynicys force-pushed the http-exponential-backoff branch from e66076a to b2ccb15 Compare February 18, 2024 15:27
@cosmin
Copy link
Contributor

cosmin commented Feb 21, 2024

This has been rebased to main and the tests do pass, but from some digging into the implications of download and upload cache I'm not sure whether this can be done safely. For example let's think of the case of using this to upload packaging output to some HTTP server.

As soon as CURL calls CurlReadCallback the underlying circular buffer is going to notify potential writers that were blocked on writing to the circular cache that there is room to write. And those writers may start writing right away, overwriting the data in the cache. Meanwhile the request fails and needs to be retried, but the underlying circular buffer was altered so the request cannot be retried.

I don't think this can be done safely with the currently IoCache interface as the backing buffers. I believe the data from the upload_cache would have to written to some other intermediary buffer (without triggering read events) so that curl and retry safely, and only upon success should the IoCache circular buffer signal the read event and advance the pointers.

Likewise for download cache, only upon successful download should the data be written to download_cache otherwise a consumer may have already consumed the bad response body by the time we retry.

Alternatively to avoid the memory copies I think we'd have to take a lock on IoCache around the curl_easy_perform loop and use some lower level interfaces than Read() and Write() (perhaps something like MaybeRead() and MaybeWrite()) and then explicitly release the lock and trigger read event once the request was successful.

@cosmin cosmin marked this pull request as draft February 23, 2024 02:19
@mariocynicys
Copy link
Member Author

@cosmin I am fiddling with this but I can see this is not fixable via IoCache (please correct me if I am wrong). CurlReadCallback might be called multiple times during a single request, but if the request fails we start all over again and not from where we left of. This means that the circular buffer must keep the data written on it since the start of the request, and if the circular buffer size is less than the request body size, then we have some cases where we must overwrite part of our upload_cache_ and if we the request fails after this point, we can't retry it.

For download_cache_ we shouldn't want readers to read before the request is successful. If part of the response was read but curl_easy_perform fails, then we should try again and seek to the very same byte we stopped reading at and continue from there? This might be problematic if different data is returned.

Copying/backing up the data elsewhere until the request is successful would work fine, but it wouldn't be memory efficient if we are dealing with big files/requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement exponential backoff for HTTP PUT
3 participants