-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle specific exception codes on RCI call. #4457
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -447,10 +447,15 @@ func (agent *ecsAgent) doStart(containerChangeEventStream *eventstream.EventStre | |
// Register the container instance | ||
err = agent.registerContainerInstance(client, vpcSubnetAttributes) | ||
if err != nil { | ||
if isTransient(err) { | ||
if isTerminal(err) { | ||
// On unrecoverable error codes, agent should terminally exit. | ||
seelog.Warnf("Agent will terminally exit, unable to register container instance: %v", err) | ||
return exitcodes.ExitTerminal | ||
} else if isTransient(err) { | ||
return exitcodes.ExitError | ||
} | ||
return exitcodes.ExitTerminal | ||
// Other errors are considered recoverable and will be retried. | ||
return exitcodes.ExitError | ||
} | ||
|
||
// Load Managed Daemon images asynchronously | ||
|
@@ -840,7 +845,13 @@ func (agent *ecsAgent) registerContainerInstance( | |
logger.Critical("Instance registration attempt with an invalid parameter", logger.Fields{ | ||
field.Error: err, | ||
}) | ||
return err | ||
return terminalError{err} | ||
} | ||
if utils.IsAWSErrorCodeEqual(err, ecsmodel.ErrCodeClientException) { | ||
logger.Critical("Instance registration attempt with client performing invalid action", logger.Fields{ | ||
field.Error: err, | ||
}) | ||
return terminalError{err} | ||
} | ||
if _, ok := err.(apierrors.AttributeError); ok { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible this is also terminal error?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handling that specific case is beyond scope of this change. However, invalidParameterException is handled above, I would not expect attributeError to fire because instance attributes are an instance parameter of the RCI call. Ecs agent should correct me here if I'm wrong There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me. We can focus on the cases we care about now. |
||
attributeErrorMsg := "" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,18 @@ func isTransient(err error) bool { | |
return ok | ||
} | ||
|
||
type terminalError struct { | ||
error | ||
} | ||
|
||
// isTerminal returns true if the error is already wrapped as an unrecoverable condition | ||
// which will allow agent to exit terminally. | ||
func isTerminal(err error) bool { | ||
// Check if the error is already wrapped as a terminalError | ||
_, terminal := err.(terminalError) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could there be a situation in which this gets wrapped again? Why not use a deeper equality check like errors.Is which inspects the error tree rather than its first layer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should not be a case where it gets wrapped again - the terminal wrap should be the last layer as it's handled in this file. So in this case, both errors.Is and the current behaves the same. Since existing isTransient is implemented similarly - hopefully this optimization shouldn't be a blocker? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a blocker, but in the case where we re-use these errors it may be a better long term change to wrap/unwrap these with the errors library. |
||
return terminal | ||
} | ||
|
||
// clusterMismatchError represents a mismatch in cluster name between the | ||
// state file and the config object | ||
type clusterMismatchError struct { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should use
logger
instead of seelog here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seelog.Warnf is chosen here to conform to existing style of using it for warning messages in this specific file i.e. there were 9 seelog.warn vs 2 logger.warn - so I opted for the more commonly used
There are 9 cases I used as reference found :
amazon-ecs-agent/agent/app/agent.go
Line 945 in 41d593c
amazon-ecs-agent/agent/app/agent.go
Line 953 in 41d593c
Where as logger is also used but mostly for logger.critical and logger.info. There are only 2 cases of logger.Warn:
amazon-ecs-agent/agent/app/agent.go
Line 1036 in 41d593c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the justification. I am no expert but it seems more idiomatic to conform to the in-house logger used in this library