Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EBPF] gpu: auto-enable agent-size check if system-probe gpu_monitoring module is enabled #32521

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions Dockerfiles/agent/cont-init.d/60-sysprobe-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,16 @@ if grep -Eq '^ *enable_oom_kill *: *true' /etc/datadog-agent/system-probe.yaml |
/etc/datadog-agent/conf.d/oom_kill.d/conf.yaml.default
fi
fi

# Match the key gpu_monitoring.enabled: true, allowing for other keys to be present below gpu_monitoring.
# regex breakdown:
# gpu_monitoring:\s*\n - match the gpu_monitoring parent key line
# (\s+.*\n)? - match any number of child keys indented under gpu_monitoring. Will stop the match if we find another parent key at the same level as gpu_monitoring
# \s+enabled\s*:\s*true - match the enabled: true key-value pair
# We use perl to read the whole file at once (-0777) and exit with 0 if the regex matches, 1 otherwise.
if perl -0777 -ne 'exit 0 if /gpu_monitoring:\s*\n(\s+.*\n)?\s+enabled\s*:\s*true/; exit 1' /etc/datadog-agent/system-probe.yaml || [[ "$DD_GPU_MONITORING_ENABLED" == "true" ]]; then
if [ -f /etc/datadog-agent/conf.d/gpu.d/conf.yaml.example ]; then
mv /etc/datadog-agent/conf.d/gpu.d/conf.yaml.example \
/etc/datadog-agent/conf.d/gpu.d/conf.yaml.default
fi
fi
20 changes: 20 additions & 0 deletions cmd/agent/dist/conf.d/gpu.d/conf.yaml.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
init_config:

instances:

-

## @param nvml_library_path - string - optional - default: ""
## Configure an alternative path for the NVML NVIDIA library. Necessary
## if the library is in a location where the agent cannot automatically find it.
#
# nvml_library_path: ""

## @param tags - list of strings following the pattern: "key:value" - optional
## List of tags to attach to every metric, event, and service check emitted by this integration.
##
## Learn more about tagging: https://docs.datadoghq.com/tagging/
#
# tags:
# - <KEY_1>:<VALUE_1>
# - <KEY_2>:<VALUE_2>
Loading