Spans are mixed on concurrent async calls on a Hapi server #483

scabfos · 2020-02-07T11:36:59Z

We use zipkin-js to observe our Hapi server and its dependencies and we noticed that when we get multiple concurrent calls, some of the spans are getting assigned to the wrong traces.

We were working on this issue with @jcchavezs and we created a PR with an example to prove with a simple Hapi project that this is in fact happening.

As you see in the images, when doing multiple concurrent calls, spans are getting assigned to the wrong one.

Steps to reproduce

Clone this PR: feat: added hapi example zipkin-js-example#28
Start zipkin local server docker run -d -p 9411:9411 openzipkin/zipkin
Run yarn && node index.js
Run ab -c10 -n10 http://127.0.0.1:3000/
Check zipkin
🤷‍♂

If you have any idea or questions you can reach us at @lussn @alexfdz @javigomez
Thank you!

The text was updated successfully, but these errors were encountered:

jcchavezs · 2020-02-09T08:40:20Z

Thanks for this @scabfos, as you may know zipkin is a volunteering project and reporting an issue with a breaking example is truly appreciated.

I managed to make it work but I am not sure how to proceed in terms of implementing the fix. The main issue here is that hapi instrumentation works in terms of listeners instead of middleware. If you look at express instrumentation you will see the following:

  return function zipkinExpressMiddleware(req, res, next) {
     ...
     tracer.scoped(() => {
        ...
        next()
      })
  }

This means that everything in next() happens scoped. In hapi case, we subscribe to events onRequest, onPreHandler, onPreResponse but we can't actually wrap with a scope what happens in the handler. If we change the code in the example from:

server.route({
    method: 'GET',
    path: '/',
    config: {
      pre: [
        {
          method: () => {
              return doCall()
          },
          assign: 'users',
        },

into

server.route({
    method: 'GET',
    path: '/',
    config: {
      pre: [
        {
          method: (req) => {
            return tracer.scoped(function() {
              tracer.setId(req._trace_id) // reasigning the traceId in the scope from https://github.com/openzipkin/zipkin-js/blob/master/packages/zipkin-instrumentation-hapi/src/hapiMiddleware.js#L28
              return doCall()
            })
          },
          assign: 'users',
        },

Then everything works as expected.

I am not necessarily an expert in JS but one alternative I see here is that we wrap the whole server and then on server.route we wrap the methods with the scope. This is going to change the way we use instrumentation as we won't use the plugin declaration anymore (it will happen underneath). Explicit wrapping is something for example this library does, hence I am afraid wrapping the whole server is the only alternative I can think of.

I am friendly pinging @hueniverse to see if there is a way to wrap the methods in hapi through plugins cc @adriancole

codefromthecrypt · 2020-02-09T15:59:36Z

what @jcchavezs said makes sense. We cannot effectively instrument a listener model, unless there is shared state between the listeners.

For example, if it is 100% the case that onX is followed by onY or onZ, and between onX and onY or onZ, user code is executed.. you can use tracer.setId(span) in onX, and revert it in onX or onY.

anuraaga · 2020-02-10T04:25:29Z

It looks like there is a plugins object where plugins can put request state they need (though this is JavaScript, we could just set any attribute if we really wanted :-) )

https://github.com/hapijs/hapi/blob/master/API.md#-requestplugins

I think this should give the shared state between listeners we need, can't we put the trace context in?

codefromthecrypt · 2020-02-10T04:30:16Z

one thing to note about the state issue is we are talking about the state around scope. ex which scope to revert to.. ex tracer.setId is similar to java currenttracecontext.newScope anyway sounds like we have options which is good!

…

On Sun, Feb 9, 2020, 8:25 PM Anuraag Agrawal ***@***.***> wrote: It looks like there is a plugins object where plugins can put request state they need (though this is JavaScript, we could just set any attribute if we really wanted :-) ) https://github.com/hapijs/hapi/blob/master/API.md#-requestplugins I think this should give the shared state between listeners we need, can't we put the trace context in? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#483?email_source=notifications&email_token=AAAPVV4NKV4FWPOWI3T7GT3RCDJLXA5CNFSM4KRNCIA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELHF2LI#issuecomment-583949613>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAPVV3YJVDCJEB2ZAJ3CULRCDJLXANCNFSM4KRNCIAQ> .

jcchavezs · 2020-02-10T14:20:36Z

what jcchavezs said makes sense. We cannot effectively instrument a listener model, unless there is shared state between the listeners.
For example, if it is 100% the case that onX is followed by onY or onZ, and between onX and onY or onZ, user code is executed.. you can use tracer.setId(span) in onX, and revert it in onX or onY.

@adriancole we currently do that but id does not work when using promises (this is what I noticed) because you can't wrap the actual handler execution with a scope.

It looks like there is a plugins object where plugins can put request state they need (though this is JavaScript, we could just set any attribute if we really wanted :-) )
https://github.com/hapijs/hapi/blob/master/API.md#-requestplugins
I think this should give the shared state between listeners we need, can't we put the trace context in?

@anuraaga we do that already, see https://github.com/openzipkin/zipkin-js/blob/master/packages/zipkin-instrumentation-hapi/src/hapiMiddleware.js#L28

one thing to note about the state issue is we are talking about the state
around scope. ex which scope to revert to.. ex tracer.setId is similar to
java currenttracecontext.newScope

@adriancole I am not sure about this statement. As I said the problem is that eventhough we set the traceId before and after the handler, the scope inside the handler (when using promises) could be anything so I'd say the only way to solve this is wrap the handler with a scope. Maybe I am wrong but after debugging this issue for several hours those are the conclusions I arrived to.

codefromthecrypt · 2020-02-10T17:13:10Z

hard for me to drive-by recommend anything. I mainly meant that tracer.setId(req._trace_id) type of thing is the only thing available in callback. scoped is not helpful there.

anuraaga · 2020-02-11T07:15:18Z

Found that lifecycle methods are called in order here

https://github.com/hapijs/hapi/blob/master/lib/request.js#L455

Indeed, because the result is always awaited, it will be interleaved with other requests.

That being said, it seems that hapi supports multiple server decorators since 17.1.0

hapijs/hapi@307a8d0

Looking at our instrumentation we wanted to use this but couldn't due to the limit of one. Now that it isn't an issue, can we use that to decorate handlers? Not sure if we have a min version requirement that doesn't work with that though.

zipkin-js/packages/zipkin-instrumentation-hapi/src/hapiMiddleware.js

Line 32 in 6abcd04

    
           // server.decorate seems a better choice for scoping the handler with the current trace ID

jcchavezs · 2020-02-14T15:22:15Z

@anuraaga @adriancole I don't see how the decorate could help us with the wrapping. If the intention is that we can override the server.routes I tried that with no success, one can not decorate existing functions, only add new ones. Maybe I am missing something?

I opened an issue in hapi repo, I hope we can get better insights in there: hapijs/hapi#4049

hueniverse · 2020-03-14T23:02:57Z

What does it mean to wrap a handler or a pre? What happens when code runs inside the scope() wrapper?

cur3n4 · 2020-05-13T09:07:20Z

I am encountering this limitation also when trying to use Continuation Local Storage to set some correlation id (similar to zipkin).
I can decorate the handler, but in my case it would be better to be able to span this across more than one lifecycle event (i.e. from onRequest to response).

I believe this might have been possible before the toolkit signals were added to the lifecycle methods signature (<=16), as one could have done something like this on the onRequest listener:

namespace.run(() => reply.continue())

cur3n4 · 2020-05-13T12:17:19Z

By the way we are currently decorating all the handlers in one go as we start the server:

server.table().forEach(route => {
    const matchingRoute = server.match(route.method, route.path);
    matchingRoute.settings.handler = loggingContextDecorator(matchingRoute.settings.handler);
});

And the loggingContextDecorator does something like:

function decorate(handler: Lifecycle.Method) {
    return async (request: Request, ...args) => {
        
        return ns.runAndReturn(async () => {
            ns.set(SOME_KEY, someValue);
    
            try {
                return await handler.apply(this, [request, ...args])
            } catch (err) {
                throw err;
            }
        })
    };
}

export function loggingContextDecorator(handler: Lifecycle.Method | object) {
    if (typeof handler  === 'function') {
        return decorate(handler as Lifecycle.Method)
    } else {
        // TODO deal with objects
    }
    return handler;
}

scabfos mentioned this issue Feb 7, 2020

feat: added hapi example openzipkin/zipkin-js-example#28

Closed

jcchavezs mentioned this issue Feb 14, 2020

How to wrap a handler/pre-handler hapijs/hapi#4049

Closed

jcchavezs mentioned this issue Oct 9, 2020

isOptional and Related checks are wrong and breaks compatibility between versions #518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spans are mixed on concurrent async calls on a Hapi server #483

Spans are mixed on concurrent async calls on a Hapi server #483

scabfos commented Feb 7, 2020

jcchavezs commented Feb 9, 2020

codefromthecrypt commented Feb 9, 2020

anuraaga commented Feb 10, 2020

codefromthecrypt commented Feb 10, 2020 via email

jcchavezs commented Feb 10, 2020

codefromthecrypt commented Feb 10, 2020

anuraaga commented Feb 11, 2020 •

edited

Loading

jcchavezs commented Feb 14, 2020

hueniverse commented Mar 14, 2020

cur3n4 commented May 13, 2020

cur3n4 commented May 13, 2020

Spans are mixed on concurrent async calls on a Hapi server #483

Spans are mixed on concurrent async calls on a Hapi server #483

Comments

scabfos commented Feb 7, 2020

Steps to reproduce

jcchavezs commented Feb 9, 2020

codefromthecrypt commented Feb 9, 2020

anuraaga commented Feb 10, 2020

codefromthecrypt commented Feb 10, 2020 via email

jcchavezs commented Feb 10, 2020

codefromthecrypt commented Feb 10, 2020

anuraaga commented Feb 11, 2020 • edited Loading

jcchavezs commented Feb 14, 2020

hueniverse commented Mar 14, 2020

cur3n4 commented May 13, 2020

cur3n4 commented May 13, 2020

anuraaga commented Feb 11, 2020 •

edited

Loading