Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to deploy custom elastic-agents as test services #786

Merged
merged 30 commits into from
May 17, 2022

Conversation

marc-gr
Copy link
Contributor

@marc-gr marc-gr commented Apr 11, 2022

For some integrations eg: auditbeat or winlogbeat dependant ones. Would be useful to define custom agents. This adds the ability to define them as a system test service and enroll them during the tests instead of the elastic-package stack ones.

Example test service definition:

version: '2.3'
services:
  auditd-agent:
    hostname: auditd-agent
    image: "docker.elastic.co/beats/elastic-agent-complete:8.2.0-SNAPSHOT"
    pid: host
    cap_add:
      - AUDIT_CONTROL
      - AUDIT_READ
    user: root
    healthcheck:
      test: "elastic-agent status"
      retries: 180
      interval: 1s
    environment:
      FLEET_ENROLL: "1"
      FLEET_INSECURE: "1"
      FLEET_URL: "http://fleet-server:8220"

Example service test config:

service: auditd-agent
custom_agent: true
data_stream:
  vars:
    audit_rules:
      - "-a always,exit -F arch=b64 -S execve,execveat -k exec"
    preserve_original_event: true

@marc-gr marc-gr requested a review from mtojek April 11, 2022 07:41
@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 11, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-17T09:42:06.702+0000

  • Duration: 27 min 54 sec

Test stats 🧪

Test Results
Failed 0
Passed 706
Skipped 0
Total 706

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 4, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (30/30) 💚
Files 64.22% (70/109) 👍
Classes 58.278% (88/151) 👍
Methods 47.333% (284/600) 👎 -0.221
Lines 32.207% (2601/8076) 👎 -0.076
Conditionals 100.0% (0/0) 💚

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks cleaner to me than the previous one.

Could you add an example test package to test/packages that uses this approach?

@marc-gr marc-gr marked this pull request as ready for review May 4, 2022 15:17
@marc-gr marc-gr requested a review from jsoriano May 4, 2022 15:17
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I added some comments, the only blocking one would be to decide how we want to configure the custom agent. Allowing any kind of docker-compose configuration may complicate maintainability of elastic-package, see my comment about this.

case "agent":
dockerComposeYMLPath := filepath.Join(serviceDeployerPath, "docker-compose.yml")
if _, err := os.Stat(dockerComposeYMLPath); err != nil {
return nil, errors.Wrap(err, "can't find expected file docker-compose.yml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Include the path where the file was expected to be found.

Makefile Outdated Show resolved Hide resolved
environment:
FLEET_ENROLL: "1"
FLEET_INSECURE: "1"
FLEET_URL: "http://fleet-server:8220"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, this makes me think that we might need some way to pass variables to these configs. These connection settings depend on the stack and how it is started. For example in #789 I am changing these options, and this would break scenarios with these environment variables.

An option can be to create a different configuration file, expected for example in _dev/deploy/agent/config.yml, that is used to "patch" a base agent configuration. So for example in this case, it could be something like this:

image: "docker.elastic.co/beats/elastic-agent-complete:8.2.0"
pid: host
cap_add:
  - AUDIT_CONTROL
  - AUDIT_READ
user: root

This way every package using this deployer only needs to configure the minimal relevant set of settings. And we can control the settings needed for enrollment, or other settings that may be dependent of elastic-package stack.

Another middle-ground option could be to use env_file here to include an environment file that would be generated by the deployer, and that includes this kind of environment variables needed by agents to connect with the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're referring to this issue (Extend "profiles" with local patches).

Do you think that we need a proposal to sketch the final look and then iterate on that? Maybe we should focus on this specific issue. It might be tricky if we want "patches" to be backward compatible with older stacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this deployer a separate thing to possible general patches for profiles or the stack subcommand. I think that something like this "agent" deployer has value on itself. Even if later we consider more advanced configurations for profiles or the stack subcommand.

The problem I see with general local patches for elastic-package stack is that they may become a source of many reproducibility problems. You may patch the stack to work on one package, and later you start working on a different package and something unexpectedly doesn't work. Or it may be difficult to know or remember that a certain package needs a patched stack.
This could cause a new set of problems similar to the ones with the packages contained in the registry depending on where elastic-package stack up was executed (#599).
And as you mention it may be difficult to support patches with multiple versions of the stack in a general way.

I think it is fine to look for a way to start/patch specialized agents on test time as is being done in this PR. These agents are disposable and developers have more awareness on when they are being started. Packages could also select the version of the agent to use, limiting the problems of supporting multiple versions. Variants could help to test with multiple versions or configurations if there are differences.

And if some day we also have stack-level patches, I think it would be ok if the patches are different to the ones used for this "agent" deployer, at the end they are different things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to look for a way to start/patch specialized agents on test time as is being done in this PR.

What about unpatching? Did you plan for this too or is it like the Kubernetes agent, once installed it stays there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to look for a way to start/patch specialized agents on test time as is being done in this PR.

What about unpatching? Did you plan for this too or is it like the Kubernetes agent, once installed it stays there.

No need to unpatch. Current implementation starts this patched agent as a docker compose service, and destroys it on tear down. So it doesn't stay. I think this is a good approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If I understand it correctly, please @marc-gr correct me if I am wrong 🙂 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct 👍

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach :)

Added a couple of comments about implementation details.

@marc-gr marc-gr requested a review from jsoriano May 16, 2022 09:48
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I left a few comments, but maybe it's better to sync over zoom to explain the technical decisions. Up to you, Marc :)

Makefile Outdated Show resolved Hide resolved
environment:
- FLEET_ENROLL=1
- FLEET_INSECURE=1
- FLEET_URL=http://fleet-server:8220
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this file, isn't FLEET_TOKEN_POLICY_NAME not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would only be needed if we want it to be enrolled to the current stack policy, but we are just interested in it during the test, so the policy will change right after for the test one, running the config useless.

Copy link
Contributor

@mtojek mtojek May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right! Could you please drop a comment there in case somebody else will come here?

ExtraArgs: []string{"--build", "-d"},
}
err = p.Up(opts)
if err != nil {
return nil, errors.Wrap(err, "could not boot up service using Docker Compose")
}

// Connect service network with stack network (for the purpose of metrics collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to move docker.ConnectToNetwork before checking if the service is healthy? Do you think that it won't break if the service doesn't boot up in time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding it makes no difference for the other cases, and in this one is a requirement (healthcheck only passes once enrolled, and needs to be in the same network to be enrolled). Do you think of any specific scenarios where this might be troublesome?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the compose.go code is generic, so it may affect also other services like kind or docker based.

You could verify the behavior by simulating a broken Docker service, for instance: a container that immediately fails setup.


cd, err := newDockerComposeServiceDeployer(
append(ymlPaths, path),
appConfig.StackImageRefs(install.DefaultStackVersion).AsEnv(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that it won't be possible to use other stack versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other stacks could be used by adding the image option in the custom-agent.yml file or using service variants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: does it mean that it's possible to boot up Elastic stack 8.1 and the agent 8.3? I'm wondering if we need a helper to select the proper stack. See kind deployer

}

for k, v := range cv {
bc.Services["custom-agent"][k] = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify a bit what is the purpose of the processing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we take all the custom options from custom-agent.yml and use them to override the base config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the base config one from ~/.elastic-agent?


cd, err := newDockerComposeServiceDeployer(
append(ymlPaths, path),
appConfig.StackImageRefs(install.DefaultStackVersion).AsEnv(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: does it mean that it's possible to boot up Elastic stack 8.1 and the agent 8.3? I'm wondering if we need a helper to select the proper stack. See kind deployer

return "", errors.Wrap(err, "marshal custom-agent config")
}

tf, err := os.CreateTemp("", dockerCustomAgentName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There is ~/.elastic-package/tmp directory and probably also a function to reach out to it.

}

func createCustomAgentYaml(cfgPath string) (string, error) {
bc := struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you please replace shortcut-named vars with something meaningful? I admit that I got lost, cv, cb, bc, .. :)

This is useful if you need different capabilities than the provided by the
`elastic-agent` used by the `elastic-package stack` command.

`custom-agent.yml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the custom-agent.yml will be part of a package, we need to cover it with package-spec. You will need to open one more PR.

Comment on lines +135 to +137
if len(outCtxt.Ports) > 0 {
outCtxt.Port = outCtxt.Ports[0]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't remember if this condition is still required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep it then the port can be referenced from the test config and that can be useful in some cases, not a hard requirement though so we can remove it if there is any concern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants