Skip to content

Commit

Permalink
chore(opentelemetry-audit): adds copy of existing otel plugin
Browse files Browse the repository at this point in the history
This is an initial commit which just copies over the already existing otel plugin.
Simply because the foundation for this otel-audit plugin will be based on the otel plugin.

---------

Signed off by: Simon Olander ([email protected])
  • Loading branch information
olandr committed Dec 19, 2024
1 parent c8fbfb1 commit 97213c7
Show file tree
Hide file tree
Showing 35 changed files with 16,420 additions and 0 deletions.
95 changes: 95 additions & 0 deletions audit-opentelemetry/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: OpenTelemetry
---

Learn more about the **OpenTelemetry** Plugin. Use it to enable the ingestion, collection and export of telemetry signals (logs and metrics) for your Greenhouse cluster.

The main terminologies used in this document can be found in [core-concepts](https://cloudoperators.github.io/greenhouse/docs/getting-started/core-concepts).

## Overview

OpenTelemetry is an observability framework and toolkit for creating and managing telemetry data such as metrics, logs and traces. Unlike other observability tools, OpenTelemetry is vendor and tool agnostic, meaning it can be used with a variety of observability backends, including open source tools such as _OpenSearch_ and _Prometheus_.

The focus of the plugin is to provide easy-to-use configurations for common use cases of receiving, processing and exporting telemetry data in Kubernetes. The storage and visualization of the same is intentionally left to other tools.

Components included in this Plugin:

- [Operator](https://opentelemetry.io/docs/kubernetes/operator/)
- [Collector](https://github.com/open-telemetry/opentelemetry-collector)
- [Receivers](https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/README.md)
- [Filelog Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver)
- [k8events Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8seventsreceiver)
- [journald Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/journaldreceiver)
- [prometheus/internal](https://opentelemetry.io/docs/collector/internal-telemetry/)
- [OpenSearch Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/opensearchexporter)

## Architecture

![OpenTelemetry Architecture](img/otel-arch.png)

## Note

It is the intention to add more configuration over time and contributions of your very own configuration is highly appreciated. If you discover bugs or want to add functionality to the plugin, feel free to create a pull request.

## Quick Start

This guide provides a quick and straightforward way to use **OpenTelemetry** as a Greenhouse Plugin on your Kubernetes cluster.

**Prerequisites**

- A running and Greenhouse-onboarded Kubernetes cluster. If you don't have one, follow the [Cluster onboarding](https://cloudoperators.github.io/greenhouse/docs/user-guides/cluster/onboarding) guide.
- For logs, a OpenSearch instance to store. If you don't have one, reach out to your observability team to get access to one.
- To gather metrics, you **must** have a Prometheus instance in the onboarded cluster for storage and for managing Prometheus specific CRDs. If you don not have an instance, install the [kube-monitoring](https://cloudoperators.github.io/greenhouse/docs/reference/catalog/kube-monitoring) Plugin first.

**Step 1:**

You can install the `OpenTelemetry` package in your cluster by installing it with [Helm](https://helm.sh/docs/helm/helm_install) manually or let the Greenhouse platform lifecycle do it for you automatically. For the latter, you can either:
1. Go to Greenhouse dashboard and select the **OpenTelemetry** plugin from the catalog. Specify the cluster and required option values.
2. Create and specify a `Plugin` resource in your Greenhouse central cluster according to the [examples](#examples).

**Step 2:**

The package will deploy the OpenTelemetry Operator which works as a manager for the collectors and auto-instrumentation of the workload. By default, the package will include a configuration for collecting metrics and logs. The log-collector is currently processing data from the [preconfigured receivers](#Overview):
- Files via the Filelog Receiver
- Kubernetes Events from the Kubernetes API server
- Journald events from systemd journal
- its own metrics

You can disable the collection of logs by setting `open_telemetry.LogCollector.enabled` to `false`. The same is true for disabling metrics: `open_telemetry.MetricsCollector.enabled` to `false`.

Based on the backend selection the telemetry data will be exporter to the backend.

**Step 3:**

Greenhouse regularly performs integration tests that are bundled with **OpenTelemetry**. These provide feedback on whether all the necessary resources are installed and continuously up and running. You will find messages about this in the plugin status and also in the Greenhouse dashboard.

## Configuration

| Name | Description | Type | required |
| ------------ | -------------------- |---------------- | ------------------ |
`openTelemetry.logsCollector.enabled` | Activates the standard configuration for logs | bool | `false`
`openTelemetry.metricsCollector.enabled` | Activates the standard configuration for metrics | bool | `false`
`openTelemetry.openSearchLogs.username` | Username for OpenSearch endpoint | secret | `false` |
`openTelemetry.openSearchLogs.password` | Password for OpenSearch endpoint | secret | `false` |
`openTelemetry.openSearchLogs.endpoint` | Endpoint URL for OpenSearch | secret | `false` |
`openTelemetry.region` | Region label for logging | string | `false` |
`openTelemetry.cluster` | Cluster label for logging | string | `false` |
`openTelemetry.prometheus.additionalLabels` | Label selector for Prometheus resources to be picked-up by the operator | map | `false` |
`openTelemetry.prometheus.rules.additionalRuleLabels` | Additional labels for PrometheusRule alerts | map | `false` |
`openTelemetry.prometheus.serviceMonitor.enabled` | Activates the service-monitoring for the Logs Collector | bool | `false` |
`openTelemetry.prometheus.podMonitor.enabled` | Activates the pod-monitoring for the Logs Collector | bool | `false` |
`openTelemetry.prometheus.rules.create` | Enables PrometheusRule resources to be created | bool | `false` |
`openTelemetry.prometheus.rules.disabled` | List of PrometheusRules to disable | map | `false` |
`openTelemetry.prometheus.rules.labels` | Labels for PrometheusRules | map | `false` |
`openTelemetry.prometheus.rules.annotations` | Annotations for PrometheusRules | map | `false` |
`openTelemetry.prometheus.rules.additionalRuleLabels` | Additional labels for PrometheusRule alerts, | map | `false` |
`opentelemetry-operator.admissionWebhooks.certManager.enabled` | Activate to use the CertManager for generating self-signed certificates | bool | `false` |
`opentelemetry-operator.admissionWebhooks.autoGenerateCert.enabled` | Activate to use Helm to create self-signed certificates | bool | `false` |
`opentelemetry-operator.admissionWebhooks.autoGenerateCert.recreate` | Activate to recreate the cert after a defined period (certPeriodDays default is 365) | bool | `false` |
`opentelemetry-operator.kubeRBACProxy.enabled` | Activate to enable Kube-RBAC-Proxy for OpenTelemetry | bool | `false` |
`opentelemetry-operator.manager.prometheusRule.defaultRules.enabled` | Activate to enable default rules for monitoring the OpenTelemetry Manager | bool | `false` |
`opentelemetry-operator.manager.prometheusRule.enabled` | Activate to enable rules for monitoring the OpenTelemetry Manager | bool | `false` |

### Examples

TBD
6 changes: 6 additions & 0 deletions audit-opentelemetry/chart/Chart.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dependencies:
- name: opentelemetry-operator
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.74.2
digest: sha256:57237ba6e2a9e5b1962673d796e355e3151833a23fb801215c19db0fc8df2df9
generated: "2024-11-21T12:18:48.753329+01:00"
20 changes: 20 additions & 0 deletions audit-opentelemetry/chart/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# SPDX-FileCopyrightText: 2024 SAP SE or an SAP affiliate company and Greenhouse contributors
# SPDX-License-Identifier: Apache-2.0

apiVersion: v2
appVersion: v0.114.0
name: opentelemetry-operator
version: 0.6.1
description: OpenTelemetry Operator Helm chart for Kubernetes
icon: https://raw.githubusercontent.com/cncf/artwork/a718fa97fffec1b9fd14147682e9e3ac0c8817cb/projects/opentelemetry/icon/color/opentelemetry-icon-color.png
type: application
maintainers:
- name: timojohlo
- name: kuckkuck
- name: viennaa
sources:
- https://github.com/cloudoperators/greenhouse-extensions
dependencies:
- name: opentelemetry-operator
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.74.2
41 changes: 41 additions & 0 deletions audit-opentelemetry/chart/alerts/collector-alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
groups:
- name: collector-alerts
rules:
{{- if not (has "FilelogRefusedLogs" .Values.openTelemetry.prometheus.rules.disabled) }}
- alert: FilelogRefusedLogs
expr: sum(rate(otelcol_receiver_refused_log_records_total{receiver=~"filelog"}[1m])) > 0
for: 5m
labels:
severity: warning
runbook_url: https://github.com/cloudoperators/greenhouse-extensions/tree/main/opentelemetry/playbooks/FilelogRefusedLogs.md
{{- include "plugin.additionalRuleLabels" . | nindent 10 }}
annotations:
summary: Logs are not successfully pushed into the filelog-receiver
description: Filelog receiver is increasingly rejecting logs
{{- end }}

{{- if not (has "ReceiverRefusedMetric" .Values.openTelemetry.prometheus.rules.disabled) }}
- alert: ReceiverRefusedMetric
expr: sum(rate(otelcol_receiver_refused_metric_points_total{}[1m])) > 0
for: 5m
labels:
severity: warning
runbook_url: https://github.com/cloudoperators/greenhouse-extensions/tree/main/opentelemetry/playbooks/ReceiverRefusedMetric.md
{{- include "plugin.additionalRuleLabels" . | nindent 10 }}
annotations:
summary: Some metric points have been refused by receiver
description: Maybe collector has received non standard metric points or it reached some limits
{{- end }}

{{- if not (has "HighCPUUsage" .Values.openTelemetry.prometheus.rules.disabled) }}
- alert: HighCPUUsage
expr: max(rate(otelcol_process_cpu_seconds{}[1m])*100) > 90
for: 5m
labels:
severity: warning
runbook_url: https://github.com/cloudoperators/greenhouse-extensions/tree/main/opentelemetry/playbooks/HighCPUUsage.md
{{- include "plugin.additionalRuleLabels" . | nindent 10 }}
annotations:
summary: High max CPU usage
description: Collector need to scale up
{{- end }}
28 changes: 28 additions & 0 deletions audit-opentelemetry/chart/alerts/operator-alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
groups:
- name: operator-alerts
rules:
{{- if not (has "ReconcileErrors" .Values.openTelemetry.prometheus.rules.disabled) }}
- alert: ReconcileErrors
expr: rate(controller_runtime_reconcile_total{controller="opentelemetrycollector",result="error"}[5m]) > 0
for: 5m
labels:
severity: warning
runbook_url: https://github.com/cloudoperators/greenhouse-extensions/tree/main/opentelemetry/playbooks/ReconcileErrors.md
{{- include "plugin.additionalRuleLabels" . | nindent 10 }}
annotations:
summary: OpenTelemetryCollector Reconciliation
description: Reconciliation errors for opentelemetrycollector are increasing
{{- end }}

{{- if not (has "WorkqueueDepth" .Values.openTelemetry.prometheus.rules.disabled) }}
- alert: WorkqueueDepth
expr: rate(controller_runtime_reconcile_total{controller="opentelemetrycollector",result="error"}[5m]) > 0
for: 5m
labels:
severity: warning
runbook_url: https://github.com/cloudoperators/greenhouse-extensions/tree/main/opentelemetry/playbooks/WorkqueueDepth.md
{{- include "plugin.additionalRuleLabels" . | nindent 10 }}
annotations:
summary: WorkqueueDepth is increasing
description: Check manager logs for reasons why this might happen
{{- end }}
Binary file not shown.
60 changes: 60 additions & 0 deletions audit-opentelemetry/chart/ci/test-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# SPDX-FileCopyrightText: 2024 SAP SE or an SAP affiliate company and Greenhouse contributors
# SPDX-License-Identifier: Apache-2.0

opentelemetry-operator:
crds:
create: false
admissionWebhooks:
create: true
failurePolicy: 'Ignore'
certManager:
enabled: false
autoGenerateCert:
enabled: true
recreate: false
manager:
collectorImage:
repository: ghcr.io/cloudoperators/opentelemetry-collector-contrib
tag: main
image:
repository: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator
tag: v0.114.1
deploymentAnnotations:
vpa-butler.cloud.sap/update-mode: Auto
prometheusRule:
enabled: false
defaultRules:
enabled: false
serviceMonitor:
enabled: false
kubeRBACProxy:
enabled: false

openTelemetry:
openSearchLogs:
endpoint: test
username: test
password: test
cluster: test
region: test
logsCollector:
enabled: true
metricsCollector:
enabled: false
prometheus:
additionalLabels:
key1: value1
key2: value2

rules:
create: true
disabled:
- FilelogRefusedLogs

testFramework:
enabled: false
image:
registry: ghcr.io
repository: cloudoperators/greenhouse-extensions-integration-test
tag: main
imagePullPolicy: IfNotPresent
Loading

0 comments on commit 97213c7

Please sign in to comment.