Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans are mixed on concurrent async calls on a Hapi server #483

Open
scabfos opened this issue Feb 7, 2020 · 11 comments
Open

Spans are mixed on concurrent async calls on a Hapi server #483

scabfos opened this issue Feb 7, 2020 · 11 comments

Comments

@scabfos
Copy link

scabfos commented Feb 7, 2020

We use zipkin-js to observe our Hapi server and its dependencies and we noticed that when we get multiple concurrent calls, some of the spans are getting assigned to the wrong traces.

We were working on this issue with @jcchavezs and we created a PR with an example to prove with a simple Hapi project that this is in fact happening.

Screen Shot 2020-02-07 at 11 57 56 AM
Screen Shot 2020-02-07 at 12 00 32 PM

As you see in the images, when doing multiple concurrent calls, spans are getting assigned to the wrong one.

Steps to reproduce

  1. Clone this PR: feat: added hapi example zipkin-js-example#28
  2. Start zipkin local server docker run -d -p 9411:9411 openzipkin/zipkin
  3. Run yarn && node index.js
  4. Run ab -c10 -n10 http://127.0.0.1:3000/
  5. Check zipkin
  6. 🤷‍♂

If you have any idea or questions you can reach us at @lussn @alexfdz @javigomez
Thank you!

@jcchavezs
Copy link
Contributor

Thanks for this @scabfos, as you may know zipkin is a volunteering project and reporting an issue with a breaking example is truly appreciated.

I managed to make it work but I am not sure how to proceed in terms of implementing the fix. The main issue here is that hapi instrumentation works in terms of listeners instead of middleware. If you look at express instrumentation you will see the following:

  return function zipkinExpressMiddleware(req, res, next) {
     ...
     tracer.scoped(() => {
        ...
        next()
      })
  }

This means that everything in next() happens scoped. In hapi case, we subscribe to events onRequest, onPreHandler, onPreResponse but we can't actually wrap with a scope what happens in the handler. If we change the code in the example from:

server.route({
    method: 'GET',
    path: '/',
    config: {
      pre: [
        {
          method: () => {
              return doCall()
          },
          assign: 'users',
        },

into

server.route({
    method: 'GET',
    path: '/',
    config: {
      pre: [
        {
          method: (req) => {
            return tracer.scoped(function() {
              tracer.setId(req._trace_id) // reasigning the traceId in the scope from https://github.com/openzipkin/zipkin-js/blob/master/packages/zipkin-instrumentation-hapi/src/hapiMiddleware.js#L28
              return doCall()
            })
          },
          assign: 'users',
        },

Then everything works as expected.

I am not necessarily an expert in JS but one alternative I see here is that we wrap the whole server and then on server.route we wrap the methods with the scope. This is going to change the way we use instrumentation as we won't use the plugin declaration anymore (it will happen underneath). Explicit wrapping is something for example this library does, hence I am afraid wrapping the whole server is the only alternative I can think of.

I am friendly pinging @hueniverse to see if there is a way to wrap the methods in hapi through plugins cc @adriancole

@codefromthecrypt
Copy link
Member

what @jcchavezs said makes sense. We cannot effectively instrument a listener model, unless there is shared state between the listeners.

For example, if it is 100% the case that onX is followed by onY or onZ, and between onX and onY or onZ, user code is executed.. you can use tracer.setId(span) in onX, and revert it in onX or onY.

@anuraaga
Copy link
Contributor

It looks like there is a plugins object where plugins can put request state they need (though this is JavaScript, we could just set any attribute if we really wanted :-) )

https://github.com/hapijs/hapi/blob/master/API.md#-requestplugins

I think this should give the shared state between listeners we need, can't we put the trace context in?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Feb 10, 2020 via email

@jcchavezs
Copy link
Contributor

what jcchavezs said makes sense. We cannot effectively instrument a listener model, unless there is shared state between the listeners.
For example, if it is 100% the case that onX is followed by onY or onZ, and between onX and onY or onZ, user code is executed.. you can use tracer.setId(span) in onX, and revert it in onX or onY.

@adriancole we currently do that but id does not work when using promises (this is what I noticed) because you can't wrap the actual handler execution with a scope.

It looks like there is a plugins object where plugins can put request state they need (though this is JavaScript, we could just set any attribute if we really wanted :-) )
https://github.com/hapijs/hapi/blob/master/API.md#-requestplugins
I think this should give the shared state between listeners we need, can't we put the trace context in?

@anuraaga we do that already, see https://github.com/openzipkin/zipkin-js/blob/master/packages/zipkin-instrumentation-hapi/src/hapiMiddleware.js#L28

one thing to note about the state issue is we are talking about the state
around scope. ex which scope to revert to.. ex tracer.setId is similar to
java currenttracecontext.newScope

@adriancole I am not sure about this statement. As I said the problem is that eventhough we set the traceId before and after the handler, the scope inside the handler (when using promises) could be anything so I'd say the only way to solve this is wrap the handler with a scope. Maybe I am wrong but after debugging this issue for several hours those are the conclusions I arrived to.

@codefromthecrypt
Copy link
Member

hard for me to drive-by recommend anything. I mainly meant that tracer.setId(req._trace_id) type of thing is the only thing available in callback. scoped is not helpful there.

@anuraaga
Copy link
Contributor

anuraaga commented Feb 11, 2020

Found that lifecycle methods are called in order here

https://github.com/hapijs/hapi/blob/master/lib/request.js#L455

Indeed, because the result is always awaited, it will be interleaved with other requests.

That being said, it seems that hapi supports multiple server decorators since 17.1.0

hapijs/hapi@307a8d0

Looking at our instrumentation we wanted to use this but couldn't due to the limit of one. Now that it isn't an issue, can we use that to decorate handlers? Not sure if we have a min version requirement that doesn't work with that though.

// server.decorate seems a better choice for scoping the handler with the current trace ID

@jcchavezs
Copy link
Contributor

@anuraaga @adriancole I don't see how the decorate could help us with the wrapping. If the intention is that we can override the server.routes I tried that with no success, one can not decorate existing functions, only add new ones. Maybe I am missing something?

I opened an issue in hapi repo, I hope we can get better insights in there: hapijs/hapi#4049

@hueniverse
Copy link

What does it mean to wrap a handler or a pre? What happens when code runs inside the scope() wrapper?

@cur3n4
Copy link

cur3n4 commented May 13, 2020

I am encountering this limitation also when trying to use Continuation Local Storage to set some correlation id (similar to zipkin).
I can decorate the handler, but in my case it would be better to be able to span this across more than one lifecycle event (i.e. from onRequest to response).

I believe this might have been possible before the toolkit signals were added to the lifecycle methods signature (<=16), as one could have done something like this on the onRequest listener:

namespace.run(() => reply.continue())

@cur3n4
Copy link

cur3n4 commented May 13, 2020

By the way we are currently decorating all the handlers in one go as we start the server:

server.table().forEach(route => {
    const matchingRoute = server.match(route.method, route.path);
    matchingRoute.settings.handler = loggingContextDecorator(matchingRoute.settings.handler);
});

And the loggingContextDecorator does something like:

function decorate(handler: Lifecycle.Method) {
    return async (request: Request, ...args) => {
        
        return ns.runAndReturn(async () => {
            ns.set(SOME_KEY, someValue);
    
            try {
                return await handler.apply(this, [request, ...args])
            } catch (err) {
                throw err;
            }
        })
    };
}

export function loggingContextDecorator(handler: Lifecycle.Method | object) {
    if (typeof handler  === 'function') {
        return decorate(handler as Lifecycle.Method)
    } else {
        // TODO deal with objects
    }
    return handler;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants