Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric for command processor latency #1529

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,26 @@ public interface JsonApiMetricsConfig {
@WithDefault("index.usage.count")
String indexUsageCounterMetrics();

@NotBlank
@WithDefault("command.processor.latency")
String commandProcessorLatencyMetrics();

/**
* The minimum value that the meter is expected to observe for the command processor latency. The
* unit is milliseconds.
*/
@NotBlank
@WithDefault("100")
int MinExpectedCommandProcessorLatency();

/**
* The maximum value that the meter is expected to observe for the command processor latency. The
* unit is milliseconds.
*/
@NotBlank
@WithDefault("15000")
int MaxExpectedCommandProcessorLatency();

/** List of values that can be used as value for metrics sort_type. */
enum SortType {
// Uses vertor search sorting for document resolution
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
Expand Down Expand Up @@ -110,9 +111,17 @@ public <T extends Command, U extends SchemaObject> Uni<CommandResult> processCom
.onItem()
.invoke(
result -> {
Tags tags = getCustomTags(commandContext, command, result);
// add metrics
sample.stop(meterRegistry.timer(jsonApiMetricsConfig.metricsName(), tags));
Tags complexTags = getCustomTags(commandContext, command, result);
Tags simpleTags = getSimpleTags(command);
// add command metrics with complex tags
long durationNs =
sample.stop(meterRegistry.timer(jsonApiMetricsConfig.metricsName(), complexTags));
// add command metrics with simple tags for histogram (reassigned the timer to ensure
// two metrics are identical)
Timer.builder(jsonApiMetricsConfig.commandProcessorLatencyMetrics())
.tags(simpleTags)
.register(meterRegistry)
.record(durationNs, TimeUnit.NANOSECONDS);

if (isCommandLevelLoggingEnabled(result, false)) {
logger.info(buildCommandLog(commandContext, command, result));
Expand Down Expand Up @@ -262,6 +271,13 @@ private <T extends SchemaObject> Tags getCustomTags(
return tags;
}

private Tags getSimpleTags(Command command) {
Tag commandTag = Tag.of(jsonApiMetricsConfig.command(), command.getClass().getSimpleName());
String tenant = dataApiRequestInfo.getTenantId().orElse(UNKNOWN_VALUE);
Tag tenantTag = Tag.of(tenantConfig.tenantTag(), tenant);
return Tags.of(commandTag, tenantTag);
}

private JsonApiMetricsConfig.SortType getVectorTypeTag(Command command) {
int filterCount = 0;
if (command instanceof Filterable fc && fc.filterClause() != null) {
Expand Down Expand Up @@ -308,6 +324,22 @@ public DistributionStatisticConfig configure(
.build()
.merge(config);
}

// reduce the number of buckets by setting the min and max expected values to avoid the high
// cardinality problem in Grafana
if (id.getName().startsWith(jsonApiMetricsConfig.commandProcessorLatencyMetrics())) {
return DistributionStatisticConfig.builder()
.percentiles(0.5, 0.90, 0.95, 0.99)
.percentilesHistogram(true)
.minimumExpectedValue(
TimeUnit.MILLISECONDS.toNanos(
jsonApiMetricsConfig.MinExpectedCommandProcessorLatency())) // 0.1 seconds
.maximumExpectedValue(
TimeUnit.MILLISECONDS.toNanos(
jsonApiMetricsConfig.MaxExpectedCommandProcessorLatency())) // 15 seconds
.build()
.merge(config);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the potential high cardinality issue in Grafana, I adjusted the bucket range by increasing the minimum value (from 0.001 sec to 0.1 sec) and decreasing the maximum value (from 30 sec to 15 sec). This change reduces the number of buckets from 69 to 36.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this makes sense wrt cardinality, 0.1 secs is 100 msec which is quite coarse. This might be ok when trying to find slowest queries, but we need to make sure it is useful for us in general. Could see what using 0.05 (for example) would give for number of buckets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 0.05 and there are 41 buckets. The metrics will be like:

# HELP command_processor_latency_seconds  
# TYPE command_processor_latency_seconds histogram
command_processor_latency_seconds{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",quantile="0.5",} 0.012058624
command_processor_latency_seconds{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",quantile="0.9",} 0.200802304
command_processor_latency_seconds{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",quantile="0.95",} 0.200802304
command_processor_latency_seconds{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",quantile="0.99",} 0.200802304
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.05",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.050331646",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.055924051",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.061516456",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.067108864",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.089478485",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.111848106",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.134217727",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.156587348",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.178956969",} 3.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.20132659",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.223696211",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.246065832",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.268435456",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.357913941",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.447392426",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.536870911",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.626349396",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.715827881",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.805306366",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.894784851",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="0.984263336",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="1.073741824",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="1.431655765",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="1.789569706",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="2.147483647",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="2.505397588",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="2.863311529",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="3.22122547",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="3.579139411",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="3.937053352",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="4.294967296",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="5.726623061",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="7.158278826",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="8.589934591",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="10.021590356",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="11.453246121",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="12.884901886",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="14.316557651",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="15.0",} 4.0
command_processor_latency_seconds_bucket{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",le="+Inf",} 4.0
command_processor_latency_seconds_count{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",} 4.0
command_processor_latency_seconds_sum{command="InsertOneCommand",module="sgv2-jsonapi",tenant="unknown",} 0.253723748

The numbers are from my local laptop so it's not accurate, will deploy this to dev to see if 0.05 makes sense and if Grafana could work

return config;
}
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,43 @@ public void metrics() throws Exception {
assertThat(line).contains("module=\"sgv2-jsonapi\"");
});
});

List<String> commandLatencyMetrics =
metrics
.lines()
.filter(
line ->
line.startsWith("command_processor_latency")
&& line.contains("CountDocumentsCommand")
&& !line.startsWith("command_processor_latency_seconds_bucket")
&& !line.contains("quantile"))
.toList();
assertThat(commandLatencyMetrics)
.satisfies(
lines -> {
assertThat(lines.size()).isEqualTo(3);
lines.forEach(
line -> {
assertThat(line).contains("command=\"CountDocumentsCommand\"");
assertThat(line).contains("tenant=\"test-tenant\"");
assertThat(line).contains("module=\"sgv2-jsonapi\"");
assertThat(line).doesNotContain("sort_type");
assertThat(line).doesNotContain("error");
assertThat(line).doesNotContain("error_code");
assertThat(line).doesNotContain("error_class");
assertThat(line).doesNotContain("vector_enabled");
});
});
List<String> commandLatencyHistogram =
metrics
.lines()
.filter(line -> line.startsWith("command_processor_latency_seconds_bucket"))
.toList();
assertThat(commandLatencyHistogram)
.satisfies(
lines -> {
assertThat(lines.size()).isNotZero();
});
}

@Test
Expand Down
Loading