Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] datadog-cluster-agent auto-detect failure when using EKS pod identity #32493

Open
sarcasticadmin opened this issue Dec 23, 2024 · 1 comment

Comments

@sarcasticadmin
Copy link

Agent Environment

datadog:

agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

Describe what happened:

Unable to leverage EKS Pod Identity

Resulting in the following error in the datadog-agent-cluster pod logs for auto-detect on pod startup:

...
cluster-agent 2024-12-18 23:36:06 UTC | CLUSTER | WARN | (subcommands/start/command.go:335 in start) | Failed to auto-detect a Kubernetes cluster name. We recommend you set it manually via the cluster_name config option
...

However using the same namespace and an adhoc container I can successfully run: aws ec2 describe-instances from the aws cli using the same EKS pod identity:

$ cat << EOF > pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: aws-debug
  namespace: datadog
spec:
  containers:
  - name: amazoncli
    image: amazon/aws-cli:2.22.2
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ] 
  dnsPolicy: Default
  serviceAccount: datadog-cluster-agent
EOF

Apply the manifest to test from the cluster's namespace:

$ kubectl apply -f ./pod.yaml

Confirm that pod associate is working for the namespace and serviceaccount:

$ kubectl exec -n datadog --stdin --tty aws-debug -- /bin/sh
sh-4.2#  aws sts get-caller-identity
...# Confirm that identity is actually the pod
sh-4.2# aws ec2 describe-instances
...# See the json returning all ec2 instances in region

Describe what you expected:

In order for auto-detect for EKS cluster name to work it requires the ec2:DescribeInstances : https://docs.datadoghq.com/containers/guide/kubernetes-cluster-name-detection/

EKS Pod Identity provides an alternative way of authenticating with IAM at the pod boundary. Its an alternative to the IRSA approach for IAM role usage inside pods. I would expect the datadog-cluster-agent to pickup IAM credentials from the environment token set by EKS Pod Identity.

Steps to reproduce the issue:

  1. Create EKS cluster with EKS pod identity enabled: https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html
  2. Set an IAM role with the following pod identity association:

namespace: datadog
serviceAccount: datadog-cluster-agent
IAM trust relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:TagSession",
                "sts:AssumeRole"
            ]
        }
    ]
}

IAM permission policies:

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceStatus"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}
  1. Deploy datadog helmchart: 3.83.0 with the following values.yml:
datadog:
  apiKeyExistingSecret: datadog
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
    instrumentation:
      skipKPITelemetry: true
  clusterAgent:
    replicas: 2
    createPodDisruptionBudget: true
  processAgent:
    enabled: true
  orchestratorExplorer:
    enabled: true
  confd:
    disk.yaml: |-
      init_config:
      instances:
        - use_mount: false
          file_system_exclude:
            - autofs$
          mount_point_exclude:
            - /proc/sys/fs/binfmt_misc
            - /host/proc/sys/fs/binfmt_misc

Additional environment details (Operating System, Cloud provider, etc):

Datadog:

helm chart: 3.83.0
agent: Cluster Agent 7.58.0 - Commit: cf39839 - Serialization version: v5.0.130 - Go version: go1.22.7

EKS:

Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.2-eks-7f9249a

This could be a similar issue to the ignoring of local environment variables during initialization of the ec2 client required for IRSA: #29916

@clamoriniere
Copy link
Contributor

Hi @sarcasticadmin

Currently, the Agent and Cluster-Agent do not yet support the EKS Pod Identity feature. However, this is on our radar, and we’ll make sure to update the community once we’ve made progress on implementing this functionality. We appreciate your understanding and patience in the meantime!

@sarcasticadmin sarcasticadmin changed the title [BUG] datadog-cluster-agent auto-detect failure with using EKS pod identity [BUG] datadog-cluster-agent auto-detect failure when using EKS pod identity Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants