Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging using breakpoints causes any further event processing from happening #22

Closed
bobjana opened this issue May 5, 2017 · 7 comments

Comments

@bobjana
Copy link
Collaborator

bobjana commented May 5, 2017

Muon Version: 0.0.5
Language: Java

Services in Use: photon, newton

As soon as a consumer of newton gets into a state which causes exception to be thrown while processing events coming from eventstore (such as event subscription), the eventstore gets into a state where any further submissions cannot be proccessed with the following type of exception:

_ “Failed to persist domain event **Event(...):A timeout occurred, the remote service did not send a response”,_

To reproduce:

  • Run photon-sample app in debug mode. Place a breakpoint in say CreateTaskCommand.execute() l.41
  • Run TaskSpecification
  • Upon hitting breakpoint give it about 10 seconds before resuming
  • Remove breakpoint
  • Re-run TaskSpecification
@bobjana bobjana added the bug label May 5, 2017
@bobjana bobjana modified the milestone: 0.0.8 May 11, 2017
@daviddawson daviddawson modified the milestones: 0.0.8, 0.0.10 May 15, 2017
@daviddawson
Copy link
Member

Caused by muoncore/muon-java#60

@daviddawson
Copy link
Member

Should be fixed in d28ae3e

@bobjana could you retest and verify?

@bobjana bobjana modified the milestone: 0.0.10 Jun 22, 2017
@CamW
Copy link
Collaborator

CamW commented Jun 22, 2017

Newton: 0.0.10-SNAPSHOT (20170622.114215-22)
Photon-lite: 0.0.5

Putting a breakpoint in a command's executeAndReturnEvents method. (In this instance it was on a call to EventSourceRepository.load but I'm quite sure I've seen it happen in other places too.) Hitting the breakpoint and waiting a few seconds and then resuming results in the following error being thrown several times:

java.lang.NullPointerException
	at io.muoncore.codec.json.JsonOnlyCodecs.decode(JsonOnlyCodecs.java:38)
	at io.muoncore.protocol.event.client.EventClientProtocol.lambda$new$0(EventClientProtocol.java:56)
	at io.muoncore.channel.impl.TimeoutChannel$2.lambda$send$0(TimeoutChannel.java:73)
	at io.muoncore.transport.client.RingBufferLocalDispatcher.route(RingBufferLocalDispatcher.java:404)
	at io.muoncore.transport.client.RingBufferLocalDispatcher$SingleThreadTask.run(RingBufferLocalDispatcher.java:297)
	at io.muoncore.transport.client.RingBufferLocalDispatcher$3.onEvent(RingBufferLocalDispatcher.java:171)
	at io.muoncore.transport.client.RingBufferLocalDispatcher$3.onEvent(RingBufferLocalDispatcher.java:168)
	at reactor.jarjar.com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

@daviddawson
Copy link
Member

Fixed in muoncore/stack-event@c61b939

@bobjana bobjana changed the title Subsequent events aren't being processed following an exception within stream subscription processing Debugging using breakpoints causes any further event processing from happening Jun 23, 2017
@bobjana
Copy link
Collaborator Author

bobjana commented Jun 23, 2017

Updated the summary to reflect the actual cause now that we have a better handle on this one. Will retest soon

@daviddawson
Copy link
Member

daviddawson commented Jun 24, 2017

To be clear on whats happening and what the above NPE and subsequent fix do.
All the messages and events in muon/ newton are dispatched on a single ringbuffer dispatch thread. When you put a breakpoint in and block that thread for a significant amount of time, the keep alive system notices that messages have not flowed for a while (5s) and will then shut down the channel.

This is the root cause of things topping when you hit a breakpoint in this code. A more full featured solution would handle the thread management of this portion of newton independently of the underlying event dispatch.

This is now improved due to the fix mentioned above, and you can see the current state of this putting a breakpoint in ChangeTaskDescriptionCommand.execute and running TaskSpecification. This will run and then halt in the right place, if you wait for 10s and then release you will see the log

i.m.channel.impl.KeepAliveChannel        : Connection has failed to stay alive, last message was received 10149ms ago, sending failure to protocol level: shared-channel
2017-06-24 19:05:48.187  INFO 27726 --- [      channel-4] i.m.t.s.client.SharedSocketRoute         : Shutting down shared-route due to channel failure
2017-06-24 19:05:48.187 DEBUG 27726 --- [      channel-4] i.m.t.s.client.SharedSocketRouter        : Removing shared-route to service photonlite
2017-06-24 19:05:48.188 DEBUG 27726 --- [ amqp-channel-1] i.m.e.a.r.RabbitMq09QueueListener        : Queue listener is cancelled:newton-sample-receive-72c6e929-e63c-4500-b696-e562d0eebf2a

As the underlying communication channels shut down due to a keep alive failure.
This is the same as was picked up a while ago in muoncore/muon-java#40

This is not a complete fix.

The solution is to isolate muon message dispatch (which needs to be a ringbuffer for performance reasons) from application code, which needs to be debuggable and doesn't have the same performance constraints per se. I will analyse the thread management in commands specifically and identify what it causing the freezing behaviour.

@bobjana
Copy link
Collaborator Author

bobjana commented Jun 26, 2017

Resolved !!!!

@bobjana bobjana closed this as completed Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants