Extended Agent telemetry histogram details #32343

iglendd · 2024-12-18T14:06:27Z

What does this PR do?

Extended Agent telemetry histogram details, specifically

Added to a histogram's payload previously omitted and implicit +Inf bucket value
Added histogram's p75, p95 and p99 values (expressed as the upper-bound for the matching bucket).

Motivation

Make it more accurate and easier to use

Describe how you validated your changes

Unit tests have been added here

Possible Drawbacks / Trade-offs

Additional Notes

agent-platform-auto-pr · 2024-12-18T14:59:53Z

Package size comparison

Comparison with ancestor 0744b78e72154436a2b6a533abb5c80be831eea5

Diff per package

package	diff	status	size	ancestor	threshold
datadog-agent-amd64-deb	1.42MB	⚠️	1272.28MB	1270.86MB	140.00MB
datadog-iot-agent-amd64-deb	0.00MB	⚠️	113.20MB	113.20MB	10.00MB
datadog-dogstatsd-amd64-deb	0.00MB	✅	78.32MB	78.32MB	10.00MB
datadog-heroku-agent-amd64-deb	1.42MB	⚠️	527.86MB	526.45MB	70.00MB
datadog-agent-x86_64-rpm	1.42MB	⚠️	1281.51MB	1280.09MB	140.00MB
datadog-agent-x86_64-suse	1.42MB	⚠️	1281.51MB	1280.09MB	140.00MB
datadog-iot-agent-x86_64-rpm	0.00MB	⚠️	113.27MB	113.26MB	10.00MB
datadog-iot-agent-x86_64-suse	0.00MB	⚠️	113.26MB	113.26MB	10.00MB
datadog-dogstatsd-x86_64-rpm	0.00MB	✅	78.40MB	78.40MB	10.00MB
datadog-dogstatsd-x86_64-suse	0.00MB	✅	78.40MB	78.40MB	10.00MB
datadog-agent-arm64-deb	-0.00MB	✅	1005.02MB	1005.02MB	140.00MB
datadog-iot-agent-arm64-deb	0.00MB	⚠️	108.67MB	108.67MB	10.00MB
datadog-dogstatsd-arm64-deb	0.00MB	✅	55.59MB	55.59MB	10.00MB
datadog-agent-aarch64-rpm	-0.00MB	✅	1014.24MB	1014.24MB	140.00MB
datadog-iot-agent-aarch64-rpm	0.00MB	⚠️	108.74MB	108.74MB	10.00MB

Decision

⚠️ Warning

agent-platform-auto-pr · 2024-12-18T15:03:15Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv aws.create-vm --pipeline-id=51783104 --os-family=ubuntu

Note: This applies to commit 9d011bd

- Added to a histogram's payload previously omitted and implicit `+Inf` bucket value - Added histogram's p75, p95 and p99 values (expressed as the upper-bound for the matching bucket).

agent-platform-auto-pr · 2024-12-19T23:44:11Z

Uncompressed package size comparison

Comparison with ancestor 272716faa23e812a822c030ab28437f9a49957fa

Diff per package

package	diff	status	size	ancestor	threshold
datadog-heroku-agent-amd64-deb	0.03MB	⚠️	505.21MB	505.17MB	70.00MB
datadog-agent-x86_64-rpm	0.03MB	⚠️	1200.09MB	1200.06MB	140.00MB
datadog-agent-x86_64-suse	0.03MB	⚠️	1200.09MB	1200.06MB	140.00MB
datadog-agent-aarch64-rpm	0.03MB	⚠️	944.35MB	944.33MB	140.00MB
datadog-agent-amd64-deb	0.03MB	⚠️	1190.80MB	1190.77MB	140.00MB
datadog-agent-arm64-deb	0.03MB	⚠️	935.08MB	935.06MB	140.00MB
datadog-iot-agent-amd64-deb	0.01MB	⚠️	113.35MB	113.34MB	10.00MB
datadog-iot-agent-x86_64-rpm	0.01MB	⚠️	113.42MB	113.41MB	10.00MB
datadog-iot-agent-x86_64-suse	0.01MB	⚠️	113.42MB	113.41MB	10.00MB
datadog-iot-agent-arm64-deb	0.00MB	✅	108.81MB	108.81MB	10.00MB
datadog-iot-agent-aarch64-rpm	0.00MB	✅	108.88MB	108.88MB	10.00MB
datadog-dogstatsd-amd64-deb	0.00MB	✅	78.57MB	78.57MB	10.00MB
datadog-dogstatsd-x86_64-rpm	0.00MB	✅	78.65MB	78.65MB	10.00MB
datadog-dogstatsd-x86_64-suse	0.00MB	✅	78.65MB	78.65MB	10.00MB
datadog-dogstatsd-arm64-deb	0.00MB	✅	55.77MB	55.77MB	10.00MB

Decision

⚠️ Warning

cit-pr-commenter · 2024-12-20T00:10:01Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: b2f04adc-c5fc-4b0d-b912-6ceeab7a557f

Baseline: 272716f
Comparison: 9d011bd
Diff

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_idle	memory utilization	+0.80	[+0.76, +0.83]	1	Logs bounds checks dashboard
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.47	[-0.22, +1.15]	1	Logs
➖	file_tree	memory utilization	+0.46	[+0.33, +0.59]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	+0.19	[-0.58, +0.97]	1	Logs
➖	file_to_blackhole_1000ms_latency_linear_load	egress throughput	+0.16	[-0.31, +0.62]	1	Logs
➖	file_to_blackhole_0ms_latency_http1	egress throughput	+0.13	[-0.72, +0.99]	1	Logs
➖	file_to_blackhole_300ms_latency	egress throughput	+0.06	[-0.58, +0.70]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.11, +0.13]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.01, +0.01]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.02	[-0.73, +0.68]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.05	[-0.95, +0.86]	1	Logs
➖	file_to_blackhole_0ms_latency_http2	egress throughput	-0.10	[-1.01, +0.80]	1	Logs
➖	quality_gate_logs	% cpu utilization	-0.22	[-3.47, +3.03]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	-0.45	[-0.54, -0.37]	1	Logs bounds checks dashboard
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.66	[-1.45, +0.13]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-1.21	[-1.27, -1.14]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	file_to_blackhole_0ms_latency	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http1	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http1	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http2	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http2	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency_linear_load	memory_usage	10/10
✅	file_to_blackhole_100ms_latency	lost_bytes	10/10
✅	file_to_blackhole_100ms_latency	memory_usage	10/10
✅	file_to_blackhole_300ms_latency	lost_bytes	10/10
✅	file_to_blackhole_300ms_latency	memory_usage	10/10
✅	file_to_blackhole_500ms_latency	lost_bytes	10/10
✅	file_to_blackhole_500ms_latency	memory_usage	10/10
✅	quality_gate_idle	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_logs	lost_bytes	10/10
✅	quality_gate_logs	memory_usage	10/10

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.

hestonhoffman

Couple suggestions

releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml

dustmop · 2024-12-20T20:29:08Z

comp/core/agenttelemetry/impl/agenttelemetry.go

 		}
+
+		// For regular metric (and for HISTOGRAM +Inf bucket which follows the last bucket)
+		keyNames = append(keyNames, keyName)


Should this be inside the for loop that starts at line 260? If metrics has 2 items (for example) and neither has tags, then keyName gets set to the metricName both iterations of the loop, and then appended to keyNames twice.

Indeed @dustmop that would be a problem. And visually it looks like that, but if you look carefully the line is inside the loop. Still, thank you for noticing that, I fall to the same conclusion by looking on the diff here.

At any rate, I have used this as an opportunity to improve code a bit more and write a few more unit tests. Thank you again.

Sorry I'm confused, I was saying it should not be inside the loop. Or rather, that this line being inside the for loop could cause a bug from my understanding of the logic.

However, if that's whats intended then perhaps it isn't a problem, and I just don't understand the larger usage.

The additional tests help to coverage the change in functionality, and they seem to be working as intended, so this is all good by me.

Co-authored-by: Heston Hoffman <[email protected]>

dustmop · 2024-12-23T20:19:49Z

comp/core/agenttelemetry/impl/agenttelemetry.go

 		}
+
+		// For regular metric (and for HISTOGRAM +Inf bucket which follows the last bucket)
+		keyNames = append(keyNames, keyName)


Sorry I'm confused, I was saying it should not be inside the loop. Or rather, that this line being inside the for loop could cause a bug from my understanding of the logic.

However, if that's whats intended then perhaps it isn't a problem, and I just don't understand the larger usage.

dustmop · 2024-12-23T20:21:53Z

comp/core/agenttelemetry/impl/agenttelemetry.go

 		}
+
+		// For regular metric (and for HISTOGRAM +Inf bucket which follows the last bucket)
+		keyNames = append(keyNames, keyName)


The additional tests help to coverage the change in functionality, and they seem to be working as intended, so this is all good by me.

iglendd · 2024-12-23T21:33:47Z

Thank you @dustmop. Sorry for adding to the confusion. I have added more comments which possibly clarify it. Indeed there are 2 possibilities (with tags and without) doubled by histogram. See my last commit for the comments.

iglendd · 2024-12-23T21:33:57Z

/merge

dd-devflow · 2024-12-23T21:34:11Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-12-23 21:34:09 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2024-12-24 01:34:10 UTC ⚠️ MergeQueue: This merge request was unqueued

[email protected] unqueued this merge request: It did not become mergeable within the expected time

iglendd · 2024-12-24T13:43:21Z

/merge

dd-devflow · 2024-12-24T13:43:29Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-12-24 13:43:29 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2024-12-24 15:42:07 UTC ℹ️ MergeQueue: merge request added to the queue

The median merge time in main is 34m.

2024-12-24 15:51:22 UTC ❌ MergeQueue: This merge request was updated

This PR is rejected because it was updated

iglendd · 2024-12-24T13:43:39Z

/merge

dd-devflow · 2024-12-24T13:43:44Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-12-24 13:43:44 UTC ❌ MergeQueue

PR already in the queue with status waiting

estherk15

Non blocking suggestion

releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml

Co-authored-by: Esther Kim <[email protected]>

iglendd · 2024-12-24T15:51:37Z

/merge

dd-devflow · 2024-12-24T15:51:45Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-12-24 15:51:45 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2024-12-24 19:51:46 UTC ⚠️ MergeQueue: This merge request was unqueued

[email protected] unqueued this merge request: It did not become mergeable within the expected time

iglendd · 2024-12-24T23:23:50Z

/merge

dd-devflow · 2024-12-24T23:24:00Z

Devflow running: `/merge`

View all feedbacks in Devflow UI.

2024-12-24 23:24:00 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2024-12-25 00:21:23 UTC ℹ️ MergeQueue: merge request added to the queue

The median merge time in main is 34m.

2024-12-25 00:56:25 UTC ℹ️ MergeQueue: This merge request was merged

iglendd added this to the 7.62.0 milestone Dec 18, 2024

github-actions bot added team/agent-shared-components long review PR is complex, plan time to review it labels Dec 18, 2024

Extended Agent telemetry histogram details

d9675d3

- Added to a histogram's payload previously omitted and implicit `+Inf` bucket value - Added histogram's p75, p95 and p99 values (expressed as the upper-bound for the matching bucket).

iglendd force-pushed the len.gamburg/agent-tel-extend-histogr branch from 46574c7 to d9675d3 Compare December 19, 2024 23:15

iglendd added the qa/done QA done before merge and regressions are covered by tests label Dec 19, 2024

iglendd marked this pull request as ready for review December 19, 2024 23:50

iglendd requested review from a team as code owners December 19, 2024 23:50

iglendd requested a review from jeremy-hanna December 19, 2024 23:50

hestonhoffman reviewed Dec 20, 2024

View reviewed changes

releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml Outdated Show resolved Hide resolved

releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml Outdated Show resolved Hide resolved

dustmop reviewed Dec 20, 2024

View reviewed changes

dustmop requested review from dustmop and removed request for jeremy-hanna December 20, 2024 20:47

iglendd and others added 4 commits December 20, 2024 21:32

Update releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml

fdea8f7

Co-authored-by: Heston Hoffman <[email protected]>

Update releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml

60acd41

Co-authored-by: Heston Hoffman <[email protected]>

Few more small improvements and unit tests

f2c96a4

Fix linter-discovered issues

f808c2f

dustmop approved these changes Dec 23, 2024

View reviewed changes

Added a few more comments to clarify an algorithm

aeda05b

Merge branch 'main' into len.gamburg/agent-tel-extend-histogr

1da9496

estherk15 approved these changes Dec 24, 2024

View reviewed changes

releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml Outdated Show resolved Hide resolved

Update releasenotes/notes/agent-tel-extend-histogr-6e2da94e63edcaf8.yaml

4dcfc06

Co-authored-by: Esther Kim <[email protected]>

Merge branch 'main' into len.gamburg/agent-tel-extend-histogr

9d011bd

dd-mergequeue bot merged commit 561fc3e into main Dec 25, 2024
222 checks passed

dd-mergequeue bot deleted the len.gamburg/agent-tel-extend-histogr branch December 25, 2024 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended Agent telemetry histogram details #32343

Extended Agent telemetry histogram details #32343

iglendd commented Dec 18, 2024 •

edited

Loading

agent-platform-auto-pr bot commented Dec 18, 2024

agent-platform-auto-pr bot commented Dec 18, 2024 •

edited

Loading

agent-platform-auto-pr bot commented Dec 19, 2024 •

edited

Loading

cit-pr-commenter bot commented Dec 20, 2024 •

edited

Loading

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

hestonhoffman left a comment

dustmop Dec 20, 2024

iglendd Dec 21, 2024

dustmop Dec 23, 2024

dustmop Dec 23, 2024

dustmop Dec 23, 2024

dustmop Dec 23, 2024

iglendd commented Dec 23, 2024

iglendd commented Dec 23, 2024

dd-devflow bot commented Dec 23, 2024 •

edited

Loading

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

estherk15 left a comment

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

Extended Agent telemetry histogram details #32343

Extended Agent telemetry histogram details #32343

Conversation

iglendd commented Dec 18, 2024 • edited Loading

What does this PR do?

Motivation

Describe how you validated your changes

Possible Drawbacks / Trade-offs

Additional Notes

agent-platform-auto-pr bot commented Dec 18, 2024

Package size comparison

Decision

agent-platform-auto-pr bot commented Dec 18, 2024 • edited Loading

Test changes on VM

agent-platform-auto-pr bot commented Dec 19, 2024 • edited Loading

Uncompressed package size comparison

Decision

cit-pr-commenter bot commented Dec 20, 2024 • edited Loading

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

hestonhoffman left a comment

Choose a reason for hiding this comment

dustmop Dec 20, 2024

Choose a reason for hiding this comment

iglendd Dec 21, 2024

Choose a reason for hiding this comment

dustmop Dec 23, 2024

Choose a reason for hiding this comment

dustmop Dec 23, 2024

Choose a reason for hiding this comment

dustmop Dec 23, 2024

Choose a reason for hiding this comment

dustmop Dec 23, 2024

Choose a reason for hiding this comment

iglendd commented Dec 23, 2024

iglendd commented Dec 23, 2024

dd-devflow bot commented Dec 23, 2024 • edited Loading

Devflow running: /merge

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 • edited Loading

Devflow running: /merge

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 • edited Loading

Devflow running: /merge

estherk15 left a comment

Choose a reason for hiding this comment

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 • edited Loading

Devflow running: /merge

iglendd commented Dec 24, 2024

dd-devflow bot commented Dec 24, 2024 • edited Loading

Devflow running: /merge

iglendd commented Dec 18, 2024 •

edited

Loading

agent-platform-auto-pr bot commented Dec 18, 2024 •

edited

Loading

agent-platform-auto-pr bot commented Dec 19, 2024 •

edited

Loading

cit-pr-commenter bot commented Dec 20, 2024 •

edited

Loading

dd-devflow bot commented Dec 23, 2024 •

edited

Loading

Devflow running: `/merge`

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

Devflow running: `/merge`

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

Devflow running: `/merge`

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

Devflow running: `/merge`

dd-devflow bot commented Dec 24, 2024 •

edited

Loading

Devflow running: `/merge`