-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default SSH for omni_util #40
Comments
I made it the default as it is the only viable option on large systems like Frontier. We can deprecate the older serialized ssh variant and just make this pdsh approach the only launch path in user-mode. At that point we can remove the command-line argument alltogether. |
Alright, that probably makes sense and will make things simpler. I was a little hesitant to remove the serialized SSH because PDSH isn't working properly in all the systems I've been using for testing, but the SSH version works just fine. (This will force me to look into it and figure out what's happening with PDSH.) |
Now that you say that, I believe I did encounter an issue with pdsh variant on hpcfund. Assuming we can resolve that, I think we only need a single approach. |
I haven't investigated this issue, but found a quick workaround that fixes it in the cluster where I'm testing. Instead of the using the standard import: https://github.com/AMDResearch/omniwatch/blob/38c7ed42ad3669385fe2857de767428645501f7d/omniwatch/omni_util.py#L40 Use the following import to force libssh (as opposed to libssh2):
Sharing this workaround for reference. It may be useful to debug the issue later. |
Additional information: ParallelSSH/parallel-ssh#363 Seems to be an issue with certain keys and newer SSH versions. Frontier is running an older version of SSH (8.4p1) compared to other systems we use (8.7p1 - 8.9p1). Generating a new RSA key doesn't help. Other types of keys work fine in my environment, but RSA is widely used and I prefer not to ask users to generate a new type of key only for Omnistat. I haven't tested anything else beyond that. We'd need to test performance at scale, but using ssh-python with the alternative import seems to work and it may be the easiest solution. |
I tested this in Frontier, and it seems to work fine with libssh. |
We currently have a
--use_pdsh
flag that is enabled by default, and cannot be disabled from the command line:https://github.com/AMDResearch/omniwatch/blob/38c7ed42ad3669385fe2857de767428645501f7d/omniwatch/omni_util.py#L305
Options:
ssh
calls are supposed to be the default, the--use_pdsh
flag should default to False.--disable-pdsh
(default: False, action: store_true).The text was updated successfully, but these errors were encountered: