-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Internalize metric label sets #2561
metrics: Internalize metric label sets #2561
Conversation
I'm reviewing the code; so far, I just wanted some clarification. My concern is about what happens during metrics reporting when you can get preemptions. Maybe that's not an issue |
Yeah that shouldn't be an issue (obviously let me know if you see one). The shared_ptr should avoid any thread safety issues and the SP type is const explicitly to avoid accidental modification once internalised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good and takes a good approach. My comments are around code documentation; that would help to make it clearer.
I agree that, theoretically, we could have dropped the GC and based the deletion on the pointer self-counter, but it's not worth the extra complexity. We don't expect that metrics changing and deletion will happen a lot.
bool enabled; | ||
skip_when_empty should_skip_when_empty; | ||
}; | ||
|
||
class internalized_holder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add class documentation? the name does not reflect what you are doing and what it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some docs. Let me know whether that makes sense. Obviously also happy to rename if you think a different name would be better.
@@ -234,7 +272,7 @@ public: | |||
}; | |||
|
|||
using register_ref = shared_ptr<registered_metric>; | |||
using metric_instances = std::map<labels_type, register_ref>; | |||
using metric_instances = std::map<internalized_holder, register_ref, std::less<>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need the std::less<> or is that a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed yes. It enables heterogenuous lookupand hence we can use the extra operator< overloads defined for internalized_holder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, you're keeping the existing type, if we were to change it to unordered map it would be better to do that seperately.
} | ||
|
||
const register_ref& at(const labels_type& l) const { | ||
return _instances.at(l); | ||
const register_ref& at(const internalized_labels_ref& l) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing, you get a const ref to internalized_labels_ref and then you call internalized_holder contractor who takes an object (because it uses the std::move on it) and at the end, you are just looking for the entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes just using labels_type
would be better here. The problem is that std::map::at
only supports heterogenuous lookup as part of C++26 [1][2].
As a workaround we could just do it manually and do a find
first and if not found throw a std::out_of_range
exception ourselves?
[1] https://stackoverflow.com/a/75329629/893693
[2] https://en.cppreference.com/w/cpp/container/map/at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes it more confusing is that you create an object implicitly by passing a const reference to a method that gets an object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but this isn't really that different to what happens when you do:
std::map<std::string, int> m;
m.at("hello are you there");
That will also create and allocate a temporary std::string
that is only used during lookup.
As suggested above we can work around it though by simply implementing at
ourselves.
3a8bcbe
to
167f724
Compare
@amnonh can you measure the improvement for ScyllaDB? |
@@ -42,6 +42,8 @@ namespace metrics { | |||
namespace impl { | |||
|
|||
using labels_type = std::map<sstring, sstring>; | |||
using internalized_labels_ref = std::shared_ptr<const labels_type>; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEGIN FROWN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
END FROWN
register_ref& operator[](const labels_type& l) { | ||
return _instances[l]; | ||
register_ref& operator[](const internalized_labels_ref& l) { | ||
return _instances[internalized_holder(l)]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is all in the impl
namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Some comments:
|
Let me answer those three together. I probably spoke too much about threadsafety in the commit message and hence caused some confusion. So in the current patch we don't actually do any cross-shard interning at this point. Hence the shard label gets interned as any other label (it gets added to the label set pretty early). The thread safety bit I was talking about comes from the metadata handling which gets copied across shards at scrape time. This is how an internalized label can live on two different shards temporarily. This is where we are already using SPs. Using a foreign_ptr would be feasible but would make the API/types quite a bit more complicated. Interning across shards is certainly another step that could be taken. Note that after this change the memory usage taken by the label set (per shard) as part of the overall metrics memory usage goes down 20-30% so it's not like further improving that bit has a massive effect. It would also require a change to the public API I think as that is currently not async. Further while it would probably help for reactor etc. metrics for user metrics it's likely that high cardinality series are probably different across shards in other labels than the shard. E.g.: in redpanda for the high cardinality series the partition would be different across shards so you wouldn't get any advantage there. So it's not immediately clear the added complexity would be worth it.
Yes this would be another memory optimization that could applied. Would need to see how it affects query time though. As mentioned above the improvements aren't massive at that point so this would need to be considered as scrapping is also quite expensive. |
I don't see std::shared_ptr in the metrics code now.
Could we, perhaps, capture the local owner of all the metrics and pass it around as a foreign_ptr? It would have to be immutable for that to work.
We could (with non-negligible effort) have a fiber to intern after the fact.
I think it would improve scrape time too. We could add a private interface between metrics and prometheus that exposes the labels as a span. Operating on cold node-based data structures is slow. Meanwhile, please see if you can drop the dependency on std::shared_ptr. I don't have a suggestion but I do want to prevent its encroachment. |
167f724
to
eef0ddd
Compare
Yes you are totally right of course. I always forget about the seastar version when I see an unqualified So in that case we can actually just replace the There is the risk of someone just copying out one of the In any case I have added two further commits. The first just tightens up the API which makes these accidental copies a bit harder and more explicit (in fact this found an unneeded copy in the protobuf prom impl). The second commit goes the full way and makes the cross-shard metadata code use individual foreign_ptrs. As suggested above this complicates the API quite a bit and is also more expensive in case the foreign references are the last copies. I'd be happy to drop the both of these commits or just the latter. Let me know what you think. |
include/seastar/core/metrics_api.hh
Outdated
@@ -21,6 +21,7 @@ | |||
|
|||
#pragma once | |||
|
|||
#include "seastar/core/shared_ptr.hh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use <> includes, not "" includes.
return lhs < rhs.labels(); | ||
} | ||
inline bool operator<(const internalized_holder& lhs, const internalized_holder& rhs) { | ||
return lhs.labels() < rhs.labels(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be replaces with a transparent comparator (https://en.cppreference.com/w/cpp/container/map/find variants 3, 4) for the key.
But why use an ordered map in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, these enable heterogeneous lookup.
@@ -356,6 +395,8 @@ struct config { | |||
sstring hostname; | |||
}; | |||
|
|||
using internalized_set = std::set<internalized_holder, std::less<>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a candidate for unordered_set. But since you need ordered compares for the other container, we might as well keep this as is.
src/core/scollectd.cc
Outdated
@@ -19,6 +19,7 @@ | |||
* Copyright (C) 2014 Cloudius Systems, Ltd. | |||
*/ | |||
|
|||
#include "seastar/core/shared_ptr.hh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<> includes
@@ -113,11 +113,9 @@ public: | |||
: _group(std::move(group)), _name( | |||
std::move(name)), _labels(labels) { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These structs aren't entirely cheap so be explicit if a copy is needed.
Further we don't want to accidentally copy them in the metadata struct
when on another shard.
I understand and agree with the spirit of this, but it's swimming against the C++ current, where value types are expected to have working copy constructors.
include/seastar/core/metrics_api.hh
Outdated
}; | ||
|
||
using metric_id = metric_id_t<internalized_labels_ref>; | ||
using foreign_metric_id = metric_id_t<foreign_ptr<internalized_labels_ref>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we'd want a foreign metric_id. Isn't it too low-level? On remote shards we work with large collections of metrics, not individual metrics ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is just how the structures are right now (I assume for reuse of existing datastructure).
The metadata list that is shared across shards for each metric family (metric_metadata_fifo
) is a list of metric_info
which in turn contains a metric_id
. The element that is mainly used are the labels from the metric_id
as that is what effectively makes the individual metric series for a metric family. But there is other fields that are used on a per series basis such as skip_when_empty
which potentially can be enabled/disabled on a per series basis depending on relabel config.
Not all fields are used though so this can be stripped down a bit by not reusing the same datastructures. The shared labels would be in there either way though. If we were to go with the approach described above of using a foreign_ptr per internalized label then it might slightly simplify that approach a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not even per internalized label set, one foreign_ptr per the entire registered metric set.
Regarding cross-shard safety, I think the best approach is to make shard data immutable and have metrics registration clone the data structure, install the metric, and replace the root pointer. Caveat 1: I don't have a clear mental picture of how it currently works so it may be unnecessary/wrong |
Sorry I don't fully understand this. So internalized labels are immutable. There is no risk of anybody modifying them. The "risk" I described in my previous message is someone (accidentally) copying the I see three ways to address this potential risk:
The current three commits reflect each of those three steps. |
First a disclaimer that I'm not well versed in the metrics code so anything I say could be nonsense.
I'm talking about immutablizing the entire metadata structure for all metrics. The idea is that metadata changes very rarely, so we can spend more time when modifying it to fully clone + add, then install the new structure. A foreign_ptr to that immutable structure is safe to traverse across shards.
Doesn't values_reference only refer to values, not the metadata? Maybe I'm wrong and it's already as I'm describing.
If we were using Rust then it would make sense to do it. It also makes sense in C++, but in my experience removing copy-ability makes things worse elsewhere. Since this is a dedicated type (not generic) and its use is heavily tied to cross-shard work, perhaps it does merit this special copy behavior. We could rationalize this by saying the type has the semantics of unique_ptr (not copyable) and the sharing is an implementation detail and not exposed to its users.
I think this is overkill. foreign_ptr:s aren't free.
Thanks for the detailed explanation. |
Gotcha, yes that is effectively what happens today (minus it being actually const but that's something we can add).
Yeah it does contain the metadata. The name is a bit confusing I guess :) Relevant types:
I didn't really like the whole |
Ah, and we generate it on-demand in update_metrics_if_needed(), rather than during construction. |
@avikivity - while we have your attention on metrics, what do you estimate the use fraction to be for scollectd vs prometheus for scylladb users? E.g., to help evaluate the effectiveness of changes that improve only the prometheus side. |
0:1 |
The major memory user in the metric subsystem is the per series label sets. This is for two reasons: - The labels are stored multiple times in different places: Twice as part of `registered_metric` (one current labels and one original one for relabeling - these are often just the same), one copy as the lookup key in the values map and another final copy in the metadata copy for scraping - Lots of the labels are actually duplicated across metric series. For example for one io-queue class/mountpoint/iogroup combination you get 15 series all with the same labels. Similar is true for other metrics like reactor or scheduler metrics. In Redpanda we have an even more extreme pattern where per partition metrics create more than 60 series with the same label set. One idea to improve this would be to intern the individual label strings. In theory this would give improvements on top of the duplication described above. Most label keys and values are duplicated even if not all labels are the same (e.g.: shard). On a more detailed look this however wouldn't be very effective. Most labels are fairly short, e.g. looking at some seastar and redpanda metrics: ``` io_queue...{class="raft",iogroup="0",mountpoint="none",shard="0"} redpanda...{namespace="kafka",partition="9985",shard="2",topic="foobar"} ``` Each of the label key and values are shorter than 16 bytes and hence fit into the seastar short string optimization. Hence, each string only takes 16 bytes. In the above example maybe one of the eight strings might leave the SSO length (e.g.: mountpoint and topic) but those are in the minority. Interning the string in less than 16 bytes is not trivial so the gain would actually not be that large (deletion needs to be thread safe as the metadata might be copied to different shards at scrape time). There is further hidden overhead. Labels are stored in a `std::map` for fast lookup. Each node in the map has a ~32 byte overhead for the node struct. So even at a zero overhead internalisation we could only save about 2x in the most likely case. Hence this patch takes the strategy of internalizing the whole labels set (`std::map`). This means that we also internalize map overhead and still get an overall reduction as described above. For the actual internalization we store the labels `std::map` in a `ss:shared_ptr`. It's stored in a `std::set` for lookup when adding new metrics. In Redpanda this patch reduces the metrics memory usage by a factor of 3-4 and hence saving gigabytes of memory at high partition counts (yes we likey have too many metrics). This also makes `update_metrics_if_needed` faster as less copying is needed and also increases cache locality at scrape time making that cheaper as well (a yet to be committed microbench by Travis Downs shows a ~10% improvement). Once all other shared_ptr references have gone out of scope we need to delete the entry out of the internalization set. We do this scan in `update_metrics_if_needed`. The overhead is minimal as the set doesn't actually get super big and extra overhead is compensated by `update_metrics_if_needed` being cheaper as mentioned above. Ideally we could do some deleter tricks but this might just be overcomplicating things.
Use a separate metadata structure for the metadata shared in `get_value`. This allows some memory savings as not all metadata is needed at scrape time. Also allows us to make the structure non-copyable and hence prevents unsafe use of the internalized labels.
eef0ddd
to
308d2df
Compare
I have pushed a change now that uses a separate datastructure. I think this is a lot cleaner like this. Unfortunately, we can't strip that much of it for now because of scollectd. |
Is anyone still using scollectd? |
I heard from Avi that no, zero people are using scollectd integration in seastar. Seems like a PR to remove support would probably be accepted easily then? |
I'm removing it from Scylla these days. |
It's no longer used. See scylladb#2561 (comment) and following. Remove it which will allow simplifying some parts of the shared metrics backend.
scylladb/scylladb#17356 - in the process of passing CI. |
I saw build failures compiling scylladb with this series, but all from unit test that reach into the impl namespace. |
It's no longer used. See scylladb#2561 (comment) and following. Remove it which will allow simplifying some parts of the shared metrics backend.
The major memory user in the metric subsystem is the per series label
sets.
This is for two reasons:
part of
registered_metric
(one current labels and one original onefor relabeling - these are often just the same), one copy as the
lookup key in the values map and another final copy in the metadata
copy for scraping
example for one io-queue class/mountpoint/iogroup combination you get
15 series all with the same labels. Similar is true for other metrics
like reactor or scheduler metrics. In Redpanda we have an even more
extreme pattern where per partition metrics create more than 60
series with the same label set.
One idea to improve this would be to intern the individual label
strings. In theory this would give improvements on top of the
duplication described above. Most label keys and values are duplicated
even if not all labels are the same (e.g.: shard).
On a more detailed look this however wouldn't be very effective. Most
labels are fairly short, e.g. looking at some seastar and redpanda
metrics:
Each of the label key and values are shorter than 16 bytes and hence fit
into the seastar short string optimization. Hence, each string only
takes 16 bytes. In the above example maybe one of the eight strings
might leave the SSO length (e.g.: mountpoint and topic) but those are in
the minority.
Interning the string in less than 16 bytes is not trivial so the gain
would actually not be that large (deletion needs to be thread safe as
the metadata might be copied to different shards at scrape time).
There is further hidden overhead. Labels are stored in a
std::map
forfast lookup. Each node in the map has a ~32 byte overhead for the node
struct. So even at a zero overhead internalisation we could only save
about 2x in the most likely case.
Hence this patch takes the strategy of internalizing the whole labels
set (
std::map
). This means that we also internalize map overhead andstill get an overall reduction as described above. For the actual
internalization we store the labels
std::map
in astd::shared_ptr
(for cross shared safety). It's stored in a
std::set
for lookup whenadding new metrics.
In Redpanda this patch reduces the metrics memory usage by a
factor of 3-4 and hence saving gigabytes of memory at high partition
counts (yes we likey have too many metrics).
This also makes
update_metrics_if_needed
faster as less copying isneeded and also increases cache locality at scrape time making that cheaper
as well (a yet to be committed microbench by Travis Downs shows a ~10%
improvement).
Once all other shared_ptr references have gone out of scope we need to
delete the entry out of the internalization set. We do this scan in
update_metrics_if_needed
. The overhead is minimal as the set doesn'tactually get super big and extra overhead is compensated by
update_metrics_if_needed
being cheaper as mentioned above. Ideally wecould do some deleter tricks but this is being complicated by thread
safety issues in case the last reference is not on the origin shard.