-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mps root to be specified #506
Conversation
deployments/helm/nvidia-device-plugin/templates/daemonset-mps-control-daemon.yml
Show resolved
Hide resolved
35fc6e1
to
251bd9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides one minor nit / question regarding a comment.
251bd9b
to
fa5ca98
Compare
fa5ca98
to
8c43fb5
Compare
0405ec9
to
661cad2
Compare
cmd/mps-control-daemon/mps/root.go
Outdated
// LogDir returns the per-resource pipe dir for the specified root. | ||
func (r Root) LogDir(resourceName spec.ResourceName) string { | ||
return r.Path(string(resourceName), "log") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a ShmDir()
just for completeness (and in case we are actually able to use a per-resource directory for this at some point in the future)?
func (r Root) ShmDir(resourceName spec.ResourceName) string {
return r.Path("shm")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we can't have a per-resource shm. The shm MUST exist at /dev/shm
in the container and as such it is not rooted at the mps Root. (I suppose technically, the mount is created at /mps/shm
in the init container (/run/nvidia/mps/shm
on the host) and mounted to /dev/shm
in both the daemonset container and the device plugin container.
Maybe what I'm trying to say, is that although symmetry would be nice, it isn't possible due to the fundamental differences between shm and the other two dirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that, but why can't plugin.mpsDaemon.ShmDir()
return /dev/shm
and plugin.mpsRoot.ShmDir(resource)
return r.Path("shm")
?
I'm not talking about symmetry on the location that gets returned, but rather symmetry on the functions that are called to get the info that we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. That's fair. I can update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated. I'm not 100% happy with requiring a resource name to Root.ShmDir()
since it may give the caller the impression that these are different accross resources. Not a blocker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't give that impression to me, necessarily. It just says -- for this resource, what is the shm
directory. The fact that it happens to be the same (or happens to be different) for all resources shouldn't matter to the caller.
internal/plugin/server.go
Outdated
}, | ||
) | ||
response.Mounts = append(response.Mounts, | ||
&pluginapi.Mount{ | ||
ContainerPath: "/dev/shm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise have this be what is returned from a call to plugin.mpsDaemon.ShmDir()
.
This change allows the MPS root on the host to be specified and uses /run/nvidia/mps by default. Signed-off-by: Evan Lezar <[email protected]>
661cad2
to
283cc30
Compare
@@ -148,11 +161,12 @@ func (plugin *NvidiaDevicePlugin) waitForMPSDaemon() error { | |||
if plugin.config.Sharing.SharingStrategy() != spec.SharingStrategyMPS { | |||
return nil | |||
} | |||
// TODO: Check the started file here. | |||
// TODO: Check the .ready file here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this now called .started
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. There are also per resource started files, but they're not used for readiness or health checks. Will revisit them as a follow up.
This change allows the MPS root on the host to be specified and uses /run/nvidia/mps by default.
This requires #516