-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct offloading of large batch entries to s3. Closes #154 #155
base: master
Are you sure you want to change the base?
Correct offloading of large batch entries to s3. Closes #154 #155
Conversation
Fix case when individual entries are under the threshold, but total batch size exceeds the threshold. Add test case covering the use-case, adjust old test that is now failing (wrong test-case)
I'm looking forward to see this PR merged. Good work, @shotmk! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented
List<Pair<Integer, Long>> entrySizes = IntStream.range(0, originalEntries.size()) | ||
.boxed() | ||
.map(i -> Pair.of(i, sizeOf(originalEntries.get(i)))) | ||
.sorted((p1, p2) -> Long.compare(p2.right(), p1.right())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of sorting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall idea here is that we want to offload the least amount of batch entries possible. Less AWS calls means less time spent on roundtrips.
So we sort by size, and after that we start moving entries from largest to smallest, stoping at the moment totalSize
fits the limit.
We only offload according to the sort order, and preserve original order of entries in resulting batch request, as it is important for fifo use-case.
hasS3Entries = true; | ||
for (Pair<Integer, Long> pair : entrySizes) { | ||
// Verify that total size of batch request is within limits | ||
if (totalSize <= clientConfiguration.getPayloadSizeThreshold() && !clientConfiguration.isAlwaysThroughS3()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are checking the total size, does it make sense to move the if condition outside of the for loop? If it is true, we can throw an exception, else, we will keep add entry to s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall idea described in comment above.
The code follow the idea, we move entries one by one (starting from largest). We keep track of total size after each offload.
We want to stop offloading entries at the moment total is under the threshold to prevent un-necessary offloading, resulting in lower overall execution time and costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ziyanli-amazon the idea is to offload as few entries as possible, as each offload is a roundtrip against S3. So offloading goes from biggest entries to smallest until batch fits into size limit.
Can this get some attention, please? |
Fix Issue #154
Fix case when individual entries are under the threshold, but total batch size exceeds the threshold. Add test case covering the use-case, adjust old test that is now failing (wrong test-case)
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.