-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve client tools host resolution #50175
Conversation
8cdb983
to
84d6013
Compare
b85e58f
to
f964532
Compare
// ErrNodeIsAmbiguous serves as an identifying error string indicating that | ||
// the proxy subsystem found multiple nodes matching the specified hostname. | ||
var ErrNodeIsAmbiguous = &trace.NotFoundError{Message: "ambiguous host could match multiple nodes"} | ||
|
||
const ( | ||
// NodeIsAmbiguous serves as an identifying error string indicating that | ||
// the proxy subsystem found multiple nodes matching the specified hostname. | ||
// TODO(tross) DELETE IN v20.0.0 | ||
// Deprecated: Prefer using ErrNodeIsAmbiguous | ||
NodeIsAmbiguous = "err-node-is-ambiguous" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I might be missing something super obvious, but I'm not seeing where/how we're preserving the meaningfulness of ErrNodeIsAmbiguous
across network boundaries/ssh. Is there some magic error handling I'm missing that is making sure that errors.Is(...)
is working correctly when ErrNodeIsAmbiguous
is generated during routing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrNodeIsAmbiguous should be unpacked after traversing network boundaries automagically by our interceptors which make use of trail.FromGRPC. The only thing that maybe won't work is the wrapping I added to include NodeIsAmbiguous
for backwards compatibility. I might have to actually include that in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I follow now. gravitational/trace
has an implementation of the Is
method that takes message string equality into account. Depending on entire message string equality feels a lot more brittle than depending on the presence of a single unique identifier within the message. It basically locks us into this exact error message forever. Also, string equality doesn't seem to usually be something that errors.Is
cares about, and I think most people touching an error that is intended to be used with errors.Is
wouldn't think of string equality as a potential compatibility issue. My preference would be to leave this the way it was.
}) | ||
switch { | ||
//TODO(tross): DELETE IN v20.0.0 | ||
case trace.IsNotImplemented(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this case to a separate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid this "legacy" behavior being in it's own method because I don't think it should ever be called directly. Even with warnings and disclaimers added to the separate method it could still be abused. I'd rather force all callers to use a single unified way of retrieving hosts than each implement something slightly different that doesn't respect proxy templates or the routing strategy.
} | ||
case err == nil: | ||
if resp.GetServer() == nil { | ||
return nil, trace.NotFound("no matching SSH hosts found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a version mismatch or a bug, not a NotFound
IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't quite sure what would be the best error type to return here. I was trying to avoid anything that would trigger our automatic reauthentication as that is likely not going to help in this scenario.
switch { | ||
//TODO(tross): DELETE IN v20.0.0 | ||
case trace.IsNotImplemented(err): | ||
resources, err := client.GetAllUnifiedResources(ctx, clt, &proto.ListUnifiedResourcesRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this section back into its own method it'll just be a renaming of the old method.
f964532
to
05b337b
Compare
Host resolution performed because labels, fuzzy search, or predicate expressions were supplied to commands that establish connections to a single host has historically been performed client side in tsh. While that works in most cases, it can prevent correctly resolving hosts in some situations, i.e. when there are ambiguous hosts and tsh is unaware that the cluster routing strategy is set to ROUTE_TO_MOST_RECENT. To improve the experience, a new ResolveSSHTarget was added to Auth to allow host resolution to be performed server side. The resolution works in a similar manner to, and was inspired by GetSSHTargets. In the event that the new RPC is not implemented, because the client is newer than Auth, tsh has also been updated to pull the cluster networking config and address any host ambiguity if allowed. As a result tsh scp and tsh proxy ssh should be much more tolerant to, and still permit access in situations where ambiguous hosts are present for some amount of time. Prior to this the only way to connect in these situations was to find the UUID of the correct target instance and try again after seeing an ambiguous host error.
6cf0529
to
67136e4
Compare
@rosstimothy See the table below for backport results.
|
Host resolution performed because labels, fuzzy search, or predicate expressions were supplied to commands that establish connections to a single host has historically been performed client side in tsh. While that works in most cases, it can prevent correctly resolving hosts in some situations, i.e. when there are ambiguous hosts and tsh is unaware that the cluster routing strategy is set to ROUTE_TO_MOST_RECENT.
To improve the experience, a new ResolveSSHTarget was added to Auth to allow host resolution to be performed server side. The resolution works in a similar manner to, and was inspired by GetSSHTargets. In the event that the new RPC is not implemented, because the client is newer than Auth, tsh has also been updated to pull the cluster networking config and address any host ambiguity if allowed.
As a result
tsh scp
andtsh proxy ssh
should be much more tolerant to, and still permit access in situations where ambiguous hosts are present for some amount of time. Prior to this the only way to connect in these situations was to find the UUID of the correct target instance and try again after seeing an ambiguous host error.Changelog: Honor the cluster routing strategy when client initiated host resolution via proxy templates or label matching is ambiguous.