-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle specific exception codes on RCI call. #4457
base: dev
Are you sure you want to change the base?
Conversation
49c48be
to
7a5da69
Compare
aa3e226
to
48fc82a
Compare
// which will allow agent to exit terminally. | ||
func isTerminal(err error) bool { | ||
// Check if the error is already wrapped as a terminalError | ||
_, terminal := err.(terminalError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a situation in which this gets wrapped again? Why not use a deeper equality check like errors.Is which inspects the error tree rather than its first layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be a case where it gets wrapped again - the terminal wrap should be the last layer as it's handled in this file. So in this case, both errors.Is and the current behaves the same.
Since existing isTransient is implemented similarly - hopefully this optimization shouldn't be a blocker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocker, but in the case where we re-use these errors it may be a better long term change to wrap/unwrap these with the errors library.
if isTransient(err) { | ||
if isTerminal(err) { | ||
// On unrecoverable error codes, agent should terminally exit. | ||
seelog.Warnf("Agent will terminally exit, unable to register container instance: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should use logger
instead of seelog here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seelog.Warnf is chosen here to conform to existing style of using it for warning messages in this specific file i.e. there were 9 seelog.warn vs 2 logger.warn - so I opted for the more commonly used
There are 9 cases I used as reference found :
amazon-ecs-agent/agent/app/agent.go
Line 945 in 41d593c
seelog.Warnf("Error initializing metrics engine: %v", err) amazon-ecs-agent/agent/app/agent.go
Line 953 in 41d593c
seelog.Warnf("Error creating telemetry session: %v", err)
Where as logger is also used but mostly for logger.critical and logger.info. There are only 2 cases of logger.Warn:
amazon-ecs-agent/agent/app/agent.go
Line 1036 in 41d593c
logger.Warn("Failed to get docker version from task engine", logger.Fields{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the justification. I am no expert but it seems more idiomatic to conform to the in-house logger used in this library
logger.Critical("Instance registration attempt with client performing invalid action", logger.Fields{ | ||
field.Error: err, | ||
}) | ||
return terminalError{err} | ||
} | ||
if _, ok := err.(apierrors.AttributeError); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible this is also terminal error?
Instance registration attempt with invalid attribute(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling that specific case is beyond scope of this change.
However, invalidParameterException is handled above, I would not expect attributeError to fire because instance attributes are an instance parameter of the RCI call. Ecs agent should correct me here if I'm wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. We can focus on the cases we care about now.
Summary
Implementation details
Changes are implemented on agent calling RCi.
Testing
Added 2 new tests, one to verify throttlingError returns exitcode 1 + one to verify ClientException will result in exit code 5 i.e. the terminal state for agent.
New tests cover the changes:
Description for the changelog
Terminally exit on unrecoverable RCI exception errors.
Additional Information
Does this PR include breaking model changes? If so, Have you added transformation functions?
No
Does this PR include the addition of new environment variables in the README?
No
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.