Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream metrics improvements #160

Merged
merged 4 commits into from
Dec 24, 2024

Conversation

StephanDollberg
Copy link
Member

Downstream:

With some adjustments for all the other metric changes we have.

The list of registered metrics that each `metric_groups_impl` stores is
represented by a `metric_id` per registered metric. `metric_id` is quite
a large struct as it contains the full name and label set. Hence, this
list causes a measurable memory overhead.

Instead just store a shared_ptr to the registration itself which we can
use to reference the `metric_id` and hence save a lot of memory per
registration (16 bytes vs. 136+ bytes).

(cherry picked from commit 05b20c8)
Each registered metric stores a shared pointer to the metrics impl such
that the metrics impl stays alive while there are metrics still
registered.

This adds 16 bytes of overhead per `registered_metric`. To reduce that
we can move the shared_ptr to `metric_groups_impl` which is alive as
long as the registered metrics are alive.

(cherry picked from commit c8d0881)
The major memory user in the metric subsystem is the per series label
sets.

This is for two reasons:
 - The labels are stored multiple times in different places: Twice as
   part of `registered_metric` (one current labels and one original one
   for relabeling - these are often just the same), one copy as the
   lookup key in the values map and another final copy in the metadata
   copy for scraping
 - Lots of the labels are actually duplicated across metric series. For
   example for one io-queue class/mountpoint/iogroup combination you get
   15 series all with the same labels. Similar is true for other metrics
   like reactor or scheduler metrics. In Redpanda we have an even more
   extreme pattern where per partition metrics create more than 60
   series with the same label set.

One idea to improve this would be to intern the individual label
strings. In theory this would give improvements on top of the
duplication described above. Most label keys and values are duplicated
even if not all labels are the same (e.g.: shard).

On a more detailed look this however wouldn't be very effective. Most
labels are fairly short, e.g. looking at some seastar and redpanda
metrics:

```
io_queue...{class="raft",iogroup="0",mountpoint="none",shard="0"}
redpanda...{namespace="kafka",partition="9985",shard="2",topic="foobar"}
```

Each of the label key and values are shorter than 16 bytes and hence fit
into the seastar short string optimization. Hence, each string only
takes 16 bytes. In the above example maybe one of the eight strings
might leave the SSO length (e.g.: mountpoint and topic) but those are in
the minority.

Interning the string in less than 16 bytes is not trivial so the gain
would actually not be that large (deletion needs to be thread safe as
the metadata might be copied to different shards at scrape time).

There is further hidden overhead. Labels are stored in a `std::map` for
fast lookup. Each node in the map has a ~32 byte overhead for the node
struct. So even at a zero overhead internalisation we could only save
about 2x in the most likely case.

Hence this patch takes the strategy of internalizing the whole labels
set (`std::map`). This means that we also internalize map overhead and
still get an overall reduction as described above. For the actual
internalization we store the labels `std::map` in a `ss:shared_ptr`.
It's stored in a `std::set` for lookup when adding new metrics.

In Redpanda this patch reduces the metrics memory usage by a
factor of 3-4 and hence saving gigabytes of memory at high partition
counts (yes we likey have too many metrics).

This also makes `update_metrics_if_needed` faster as less copying is
needed and also increases cache locality at scrape time making that cheaper
as well (a yet to be committed microbench by Travis Downs shows a ~10%
improvement).

Once all other shared_ptr references have gone out of scope we need to
delete the entry out of the internalization set. We do this scan in
`update_metrics_if_needed`. The overhead is minimal as the set doesn't
actually get super big and extra overhead is compensated by
`update_metrics_if_needed` being cheaper as mentioned above. Ideally we
could do some deleter tricks but this might just be overcomplicating
things.

(cherry picked from commit 44d7b8f)
Use a separate metadata structure for the metadata shared in
`get_value`.

This allows some memory savings as not all metadata is needed at scrape
time.

Also allows us to make the structure non-copyable and hence prevents
unsafe use of the internalized labels.

(cherry picked from commit 308d2df)
@StephanDollberg StephanDollberg changed the title Stephan/downstream metrics improvements Downstream metric improvements Dec 19, 2024
@StephanDollberg StephanDollberg changed the title Downstream metric improvements Downstream metrics improvements Dec 19, 2024
@StephanDollberg StephanDollberg merged commit 886d078 into v25.1.x-pre Dec 24, 2024
0 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants