Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prte.c: a prefix of "/" is ok #1848

Merged
merged 1 commit into from
Nov 17, 2023

Conversation

jsquyres
Copy link
Contributor

@jsquyres jsquyres commented Nov 17, 2023

It's not an error if the prefix ends up being a plain "/". More specifically, if we strip off all trailing "/" characters from the prefix and end up with an empty string, then the prefix is just "/".


Refs open-mpi/ompi#12071

@rhc54 Does this PR look right to you? I can't remember a reason why we thought that a prefix of / is actually an error. Specifically: the original code (rightfully) strips off all trailing / characters. If we end up with an empty string, we declare that to be an error (this is not new -- this scheme goes all the way back to ORTE).

But I can't think of a reason why a prefix of / is an error. Can you? If we can't think of any reasons why a prefix of / is an error, this PR adjusts the string handling so that if we end up with an empty string, that means that the prefix is really just /, so put that back.

@rhc54
Copy link
Contributor

rhc54 commented Nov 17, 2023

I'm not entirely sure that is correct. I'll need to think about it and play with it a bit. I'm sure we did it for a reason, but I also cannot recall why.

As for the referenced issue, that is the wrong code path - it is the "else" condition since no cmd line --prefix option was given and it wasn't configured to enable prefix-by-default. So this won't address that reported issue.

For that issue, it is this code that is the concern:

                if (0 == strcmp("bin", tmp_basename)) {
                    char *tmp = tpath;
                    tpath = pmix_dirname(tmp);
                    free(tmp);

In other words, it is the "bin" in /bin/mpiexec that may be the issue. I still need to confirm that as actually being an issue. I understand why we did that - the "prefix" we set has to point to all the directories we need (including lib, for example), so we need the level above the "bin" directory. I don't think pmix_dirname gives us the right thing in this case (it does fine for a case like /usr/bin), so we may need to special case it.

@rhc54
Copy link
Contributor

rhc54 commented Nov 17, 2023

Here is why we probably excluded a / prefix. I created a /foo symlink to the PRRTE install path's bin directory. Then I added a little output to see what is going on with the code path and command cited in the referenced issue - here is the code:

       /* Check if called with fully-qualified path to prte.
           (Note: Put this second so can override with --prefix (above). */
        tpath = NULL;
        if ('/' == argv[0][0]) {
            char *tmp_basename = NULL;
            tpath = pmix_dirname(argv[0]);

            if (NULL != tpath) {
                /* Quick sanity check to ensure we got
                   something/bin/<exec_name> and that the installation
                   tree is at least more or less what we expect it to
                   be */
                tmp_basename = pmix_basename(tpath);
                pmix_output(0, "BASENAME %s", tmp_basename);
                if (0 == strcmp("foo", tmp_basename)) {
                    char *tmp = tpath;
                    tpath = pmix_dirname(tmp);
                    pmix_output(0, "TMP %s TPATH %s", tmp, tpath);
                    free(tmp);
                } else {
                    free(tpath);
                    tpath = NULL;
                }
                free(tmp_basename);
            }
            if (NULL != tpath) {
                prte_set_attribute(&dapp->attributes, PRTE_APP_PREFIX_DIR, PRTE_ATTR_GLOBAL,
                                   tpath, PMIX_STRING);
            }

and here is what happens:

$ /foo/prterun -n 1 hostname
[rhc-node01:56680] BASENAME foo
[rhc-node01:56680] TMP /foo TPATH /
bash: //bin/prted: No such file or directory

Ignore the error - we know that PRRTE wasn't installed in /bin, but I'm not about to totally hose the system by blowing away /bin and replacing it with a symlink. Key takeaway is that we wind up with a path that starts with a double // and we wanted to avoid that. IIRC, there are some environments that treat // differently - but I might be misremembering it.

Anyway, what that means is - minus the // - the code appears to be working correctly in that section. I also concluded that the original cited issue was incomplete - the only way to get that error message with the cited cmd line is that prefix-by-default must have been given. Sigh - would really help if people gave complete info!

In terms of what you are addressing here, it will result in the same // at the beginning of the prted cmd. Problem is that if you don't set / as a prefix, then PRRTE won't know to prefix the prted cmd at all.

So I'm thinking that your change is likely okay, but maybe incomplete. We need to also go into the plm/ssh launch code and check for the // at the beginning of the cmd - if we find it, then we remove the redundant /. If someone wants to prove that the // isn't a problem in any environment, then you don't need the secondary fix.

There is a small change in the above code path - I've added it to your branch.

Make sense?

@rhc54
Copy link
Contributor

rhc54 commented Nov 17, 2023

Think I fixed the // problem - see what you think.

It's not an error if the prefix ends up being a plain "/".  More
specifically, if we strip off all trailing "/" characters from the
prefix and end up with an empty string, then the prefix is just "/".

Protect against a NULL prefix being added to the app's attributes.

Protect against a double-/ in the prted cmd path in the ssh
launch component.

Signed-off-by: Jeff Squyres <[email protected]>
Signed-off-by: Ralph Castain <[email protected]>
@rhc54 rhc54 force-pushed the pr/prefix-of-slash-aint-no-error branch from d4f8a23 to 324a9c6 Compare November 17, 2023 15:56
@rhc54
Copy link
Contributor

rhc54 commented Nov 17, 2023

Okay, I cleaned this up, squashed it, and pushed. We actually don't need to do prun - it doesn't look for prefixes as it doesn't load a PRRTE library. So these should be the only changes required.

@jsquyres Take a final look?

@jsquyres
Copy link
Contributor Author

Looks good to me!

@jsquyres jsquyres merged commit ce7f686 into openpmix:master Nov 17, 2023
11 checks passed
@jsquyres jsquyres deleted the pr/prefix-of-slash-aint-no-error branch November 17, 2023 16:14
jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 17, 2023
This is a backport of openpmix/prrte#1848 to
ORTE.  A prefix of "/" is not an error; there's no need to emit a
message and fail.  We want to ensure to not pass a double slash in the
ssh module to the remote host (i.e., the prefix of "/" followed by
"/<bindir>"), so make sure to handle that properly.

Signed-off-by: Jeff Squyres <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants