Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using custom Recorder and add option to always record spans #158

Merged
merged 4 commits into from
Jul 3, 2020

Conversation

drolando
Copy link
Contributor

This will let me implement "firehose mode", similar to what we have in
py_zipkin.

Basically what I need is a way to call 2 different Reporters:

  • one gets called only when sampled is True
  • one gets called every time (this is the firehose recorder)
    This cannot be accomplished with just one Reporter, because once we're
    inside the reporter we've lost any info on whether the trace was sampled
    or not.

I can almost do this in zipkin-php, there are only 2 things missing.

  • if sampled is False, we create a NoopSpan. So I need an extra flag to
    inform the tracer that I always want to create a RealSpan
  • the Recorder is hardcoded inside the Tracer class, so I cannot pass my
    own.

Once those 2 things are supported, I can write my own Recorder subclass
that calls the right Reporters as needed with minimal changes to the
core zipkin-php code.

cc @adriancole

@codefromthecrypt
Copy link
Member

in Brave we solve this with sampledLocal internal flag

https://github.com/openzipkin/brave/blob/169d93ac71fce759fb872636576d539c6c3b5123/brave/src/main/java/brave/propagation/SamplingFlags.java#L89-L102

then the zipkin "recorder" would look at if it is supposed to act as default (only send ones remote sampled), or wildcard send (alwaysSampleLocal)
https://github.com/openzipkin/zipkin-reporter-java/blob/master/brave/src/main/java/zipkin2/reporter/brave/ZipkinSpanHandler.java#L147

Note: something slightly different what you are doing vs alwaysSampleLocal used in secondary sampling.. in secondary sampling how netflix use it, they add a comma-separated tag "sampled_keys" so that their firehose proxy can decide what to do vs decide in-process. Ex. if that tag includes "b3" they know it was sampled because of b3 vs wildcard or some overlay.

https://github.com/openzipkin-contrib/zipkin-secondary-sampling/blob/master/docs/design.md#the-sampled_keys-tag
https://github.com/openzipkin-contrib/zipkin-secondary-sampling/blob/master/brave/src/test/java/brave/secondary_sampling/BasicUsageTest.java#L86-L90

If making the decision in-process you don't need to add a tag, just route it to another sender based on logic you deploy.

@drolando
Copy link
Contributor Author

Doing something similar here would mean adding a new sampledLocal flag to TraceContext for each span. Then in the reporter I can loop through them and send all the ones that have sampledLocal=true to firehose, correct?

@drolando
Copy link
Contributor Author

And I see you have the same alwaysReportSpans flag too

@codefromthecrypt
Copy link
Member

yeap

sampledlocal is true when sampled is true or if alwaysReportSpans is set. (later you can get fancy and conditionally set sampledLocal based on http headers)

alwaysReportSpans is a tracer-scoped flag where sampledLocal is a span/context scoped flag

@jcchavezs
Copy link
Contributor

@drolando thanks for this. I am reluctant to expose the Recorder to the user, mostly because I always saw it as an internal type and making it customizable would create some weird API surface (starting by the fact that we pass the reporter as a parameter already and it is supposed to be included in the recorder aswell). I'd prefer the approach @adriancole suggests by using the sampledLocal flag.

@adriancole I got a question, to summarize what we could do is:

  1. Include the Context Flags or at leas the sampledLocal in the MutableSpan (so the reporter has access to sampledLocal, not sure which option is the best)
  2. Create the SpanHandler interface whose default class only forwards the remote sampled spans (notice this can enable better things like only forward remote sampled spans with an error or certain duration)
  3. Let the Reporter to decide how to multiplex the spans based on the sampledLocal flags?

@drolando drolando force-pushed the support_firehose branch 3 times, most recently from 9271bde to dabaf31 Compare June 15, 2020 22:16
$span->finish();

$tracer->flush();
$spans = $reporter->flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why we are using another word. Ex alwaysReportSpans would make sense to have reporter report. alwaysEmitSpans would makes sense to have emitter emit :)

here's a related post on naming the TL;DR; being that we make enough words at this point https://publicobject.com/2020/06/06/synonyms-are-bad/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that's a better name

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case is valid.. made a comment about naming (bikeshed but I think worth thinking for a sec)

regardless, probably copying some of the doc from here might help!

https://github.com/openzipkin/zipkin-reporter-java/blob/master/brave/src/main/java/zipkin2/reporter/brave/ZipkinSpanHandler.java#L109-L127

@drolando
Copy link
Contributor Author

@jcchavezs any comment on this? Does it look good?

@jcchavezs
Copy link
Contributor

jcchavezs commented Jun 24, 2020 via email

@@ -156,7 +163,8 @@ public static function createFromContext(TraceContext $context, Endpoint $localE
$context->getSpanId(),
$context->isDebug(),
$context->isShared(),
$localEndpoint
$localEndpoint,
$context->isSampled()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps one slight thing here is that in brave the "handler" has 2 arguments. since it has the context as one of the args, we don't need to also copy isSampled to the span.

  @Override public boolean end(TraceContext context, MutableSpan span, Cause cause) {
    if (!alwaysReportSpans && !Boolean.TRUE.equals(context.sampled())) return true;
--snip--

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying to change this, just mentioning a reference. it is fine to also copy it to the span, just it should be immutable (appears to be the case!)

@drolando drolando force-pushed the support_firehose branch from b4c1854 to 98adc64 Compare July 1, 2020 22:52
@drolando
Copy link
Contributor Author

drolando commented Jul 1, 2020

isSampled in the recording span would always be Boolean by the time it is reported. No need to be nullable.

This isn't always true, at least in tests. If I create the context with TraceContext::createAsRoot(DefaultSamplingFlags::createAsEmpty()); then isSampled is null.

@jcchavezs
Copy link
Contributor

jcchavezs commented Jul 3, 2020

This isn't always true, at least in tests. If I create the context with TraceContext::createAsRoot(DefaultSamplingFlags::createAsEmpty()); then isSampled is null.

Yes but when making it a span then the sampling decision happens. Anyways I am going to merge this and then investigate that and rebase onto #163.

EDIT: I need some input. As I mentioned when we turn a context with null sampled into span, we force to make the sampling decision and theoretically it will never happen that the reporter receives an unsampled MutableSpan. However that needs the tests for MutableSpan to be aware of this (hence not using null sampled contexts). This can be fixed with some documentation or just accept null sampled (which is not too terrible), how do you feel about that @adriancole ?

private function __construct(
string $traceId,
?string $parentId,
string $spanId,
bool $debug,
bool $shared,
Endpoint $localEndpoint
Endpoint $localEndpoint,
?bool $isSampled = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also move this before endpoint as we changed the order already.

@@ -85,4 +86,58 @@ private function randomBool()
{
return (bool) mt_rand(0, 1);
}

public function testAlwaysEmitSpans()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also change this test into alwaysReportSpans.

@jcchavezs jcchavezs merged commit b5b5fdd into openzipkin:master Jul 3, 2020
@jcchavezs
Copy link
Contributor

Thanks for the great work @drolando !

@codefromthecrypt
Copy link
Member

on the sampled null part. Indeed currently in brave a span must have a sampling decision, but since the context is independent of the span, and it could nave a null decision, then we have to allow it null. At some point we wanted to have "incognito mode" or "pass through" where you don't interfere with anything in the context, but let it be passed across. Especially in that case it could pass around null , but even now it is possible even if not normal.

*/
public function isSampled(): bool
{
return $this->isSampled === true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a question here @adriancole @drolando: what we are saying here is whether this is sampled or not (and then dispatch the span to different reporters accordingly) but what about debug? if something is sampled=false but debug=true then we will dispatch it to the local sampled reporter, isn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is better we change this into shouldRecord which also checks the debug flag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving a second thought, I think it is responsability of the user to decide what to do when sampled=false and debug=true.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 5, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants