-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acknowledgement failed when the application graceful shutdown #925
Comments
Hey @XUmeng96, sorry for the delay. You're right in that there's a racing condition there where the ack processor may stop before all acks are added to the queue. Maybe it's worth it adding logic to also wait for the queue to be empty. Nevertheless, considering we have a dedicated thread that pretty much just waits on this queue and adds messages to the buffer, I'd think more often than not we wouldn't have this issue. What you are seeing seems much more like an effect of this logic where we ignore new messages for acknowledgement after we receive the order to stop: Lines 157 to 164 in 2c39154
Also, keep in mind that the message source only signals the acknowledgement processor to stop after all messages have completed processor - or Lines 295 to 311 in 2c39154
You can set your log level to Overall, batching acks offers weak guarantees on that all messages will necessarily be acknowledged - acknowledgement might fail for example. If you need strong guarantees, you should probably use the Does this make sense to you? Thanks. |
I'm also experiencing this issue now where a message is added via In my opinion it would make sense to try to acknowledge all events hat have been received as I don't see why we can only have a weak guarantee here. Also, if the defaults allow a situation where a message is processed successfully on the application side, but is not necessarily acknowledged in SQS then either the documentation should be updated to reflect that important information (imho) or the default should ensure a strong guarantee |
Hey @MarcelLorke6666, thanks for your report. I should have time to look into this this week.
Standard SQS queues provide an
FIFO queues have strong guarantees - it guarantees that a message will be served exactly once and in order within a timeframe if both Producer and Consumer are correctly configured. The framework also respects that by defaulting to Defaulting Standard SQS queues to IMMEDIATE acknowledgement though would incur in a significant performance and cost hit, while doing really nothing for the user in terms of guarantees - though users can configure that if they so wish. |
Thank you for taking the time! You are right and generally we also have measures in place. In our specific scenario the deduplication logic already evicted this event and we were surprised to see it again a few hours later as from the perspective of the application everything went just fine. |
Thanks for the info in previous comments. We are also experiencing this issue. Messages are being received multiple times (max receive == 2) and completing successfully but never being acknowledged. This results in the maxReceiveCount being hit and messages go to the dlq for our service which triggers false alerts for service issues. This is actually quite likely to happen in k8s environments where the scheduler frequently stops pods and your message is unlucky enough to be received each time by a pod that is about to be shutdown.
You could argue that since this issue breaks message acknowledgements then the at-least-once guarantee has been broken We are going to try the immediate acknowledge as a workaround for now. |
Thanks @mgrundie-r7, let us know how that works for you. I'm looking into another ack issue and should be able to look into this this one this weekend or so. It's a good thing that with increased usage issues are surfacing so we can fix them. We'll get there :) |
Hey everyone, I haven't had the time to work on this yet. The fix itself doesn't look too complex, but it should be well tested so we don't have a regression later. If anyone would like to contribute a PR for this I'd be happy to review and merge it. Thanks. |
Hey everyone, finally had the time for this fix. There are no other acknowledgement problems on the radar, so hopefully all is well now 🙌🏼 Here's the PR with the fix Reviews are appreciated, and feel free to checkout the branch and test it if you'd like. Thanks! |
Fixed in #1082. |
Type: Bug
Component:
SQS
Describe the bug
Version:
Error:
I use a @SqsListener and a MessageListenerContainerFactory bean to create the MessageListenerContainer. batching acknoeledge messages.
when I shut down the application, I got some error message, or maybe no error but the messages still inflight until the visibility timeout instead of acknoledged.
I hope the acknoledgement can be finished even if the shutdownhook triggered.
I saw doStop() method in BatchingAcknowledgementProcessor . It only waiting for runningAcks until the acknowledgementShutdownTimeout. If there are some messages in BlockingQueue acks and not in Map<String, BlockingQueue<Message>> acksBuffer, these messages will not be acknowlwdged, right?
I'm not sure why we only waiting for runningAcks to finish and Is there a way to ensure that all messages being processed are acknowledged?
Sample
MessageListenerContainerFactory config as following:
and I simulate the business logic by sleeping 15 seconds.
The text was updated successfully, but these errors were encountered: