Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mps root to be specified #506

Merged
merged 1 commit into from
Feb 21, 2024
Merged

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 8, 2024

This change allows the MPS root on the host to be specified and uses /run/nvidia/mps by default.

This requires #516

@elezar elezar self-assigned this Feb 8, 2024
@elezar
Copy link
Member Author

elezar commented Feb 8, 2024

@elezar elezar force-pushed the use-var-run-nvidia-mps branch 2 times, most recently from 35fc6e1 to 251bd9b Compare February 13, 2024 15:16
Copy link
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides one minor nit / question regarding a comment.

internal/plugin/server.go Outdated Show resolved Hide resolved
@elezar elezar marked this pull request as draft February 19, 2024 10:14
internal/plugin/server.go Outdated Show resolved Hide resolved
@elezar elezar force-pushed the use-var-run-nvidia-mps branch from 251bd9b to fa5ca98 Compare February 19, 2024 12:03
@elezar elezar marked this pull request as ready for review February 19, 2024 12:04
@elezar elezar force-pushed the use-var-run-nvidia-mps branch from fa5ca98 to 8c43fb5 Compare February 19, 2024 14:04
internal/plugin/server.go Outdated Show resolved Hide resolved
@elezar elezar force-pushed the use-var-run-nvidia-mps branch 2 times, most recently from 0405ec9 to 661cad2 Compare February 21, 2024 14:17
Comment on lines 39 to 49
// LogDir returns the per-resource pipe dir for the specified root.
func (r Root) LogDir(resourceName spec.ResourceName) string {
return r.Path(string(resourceName), "log")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a ShmDir() just for completeness (and in case we are actually able to use a per-resource directory for this at some point in the future)?

func (r Root) ShmDir(resourceName spec.ResourceName) string {
	return r.Path("shm")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we can't have a per-resource shm. The shm MUST exist at /dev/shm in the container and as such it is not rooted at the mps Root. (I suppose technically, the mount is created at /mps/shm in the init container (/run/nvidia/mps/shm on the host) and mounted to /dev/shm in both the daemonset container and the device plugin container.

Maybe what I'm trying to say, is that although symmetry would be nice, it isn't possible due to the fundamental differences between shm and the other two dirs.

Copy link
Contributor

@klueska klueska Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but why can't plugin.mpsDaemon.ShmDir() return /dev/shm and plugin.mpsRoot.ShmDir(resource) return r.Path("shm")?

I'm not talking about symmetry on the location that gets returned, but rather symmetry on the functions that are called to get the info that we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That's fair. I can update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated. I'm not 100% happy with requiring a resource name to Root.ShmDir() since it may give the caller the impression that these are different accross resources. Not a blocker though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't give that impression to me, necessarily. It just says -- for this resource, what is the shm directory. The fact that it happens to be the same (or happens to be different) for all resources shouldn't matter to the caller.

},
)
response.Mounts = append(response.Mounts,
&pluginapi.Mount{
ContainerPath: "/dev/shm",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise have this be what is returned from a call to plugin.mpsDaemon.ShmDir().

This change allows the MPS root on the host to be specified
and uses /run/nvidia/mps by default.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the use-var-run-nvidia-mps branch from 661cad2 to 283cc30 Compare February 21, 2024 16:01
@@ -148,11 +161,12 @@ func (plugin *NvidiaDevicePlugin) waitForMPSDaemon() error {
if plugin.config.Sharing.SharingStrategy() != spec.SharingStrategyMPS {
return nil
}
// TODO: Check the started file here.
// TODO: Check the .ready file here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this now called .started?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. There are also per resource started files, but they're not used for readiness or health checks. Will revisit them as a follow up.

@elezar elezar merged commit c675243 into NVIDIA:main Feb 21, 2024
6 checks passed
@elezar elezar deleted the use-var-run-nvidia-mps branch February 21, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants