STORM-166:Nimbus HA solution based on Zookeeper #61

yveschina · 2014-04-04T07:54:55Z

Nimbus HA feature is quite important for our application running on the storm cluster. So, we've been working on the problem for some time and now a solution seems not that perfect but be enough to apply has comed out.

1.Nimbus Servers now can register themselves in Zookeeper. They perform a leader election using "InterProcessMutex" interact with Zookeeper to ensure that there is only one nimbus responsible for launching and monitoring topologies.

2.Every Nimbus Server is running a timer to compare and find if there are topology codes which are not exists on it's local disk. They would download lcoal missing topology codes from the Nimbus leader through the thrift RPC just like Supervisors do.With this feature, any numbers of Nimbus Server can be launched through out the cluster.

3.StormSubmitter,Supervisor,Non-leader Nimbus and Storm UI now are able to find and connect to the Nimbus leader via Zookeeper.A Nimbus leadership table is also added to Storm-UI on the main page to show every Nimbus's leader-election state and it's host in addition.

PS: Some implementation of the Nimbus-Election part has taken @Frostman's solution for reference(link: nathanmarz/storm#422).

revans2 · 2014-04-07T16:11:08Z

storm-core/src/clj/backtype/storm/ui/core.clj

-            {:text "Nimbus uptime" :attr {:class "tip right"
-                                          :title (:nimbus-uptime tips)}}
+            {:text "Nimbus leader uptime" :attr {:class "tip right"
+                                          :title (:nimbus-leader-uptime tips)}}


Please actually define an entry in tips for ':nimbus-leader-uptime'

I will fix the tips soon

revans2 · 2014-04-07T16:57:45Z

I have done a quick pass through the code. It looks like there are several places that the code is leaking connections to ZK. I am also concerned about the extra load that this may be placing on ZK. ZK is already the bottleneck for scalability of the cluster, a new client connecting does a write operation to the cluster, and having every client make a connection is bad, but not a deal-breaker. However, also having the existing daemons constantly making new connections to ZK feels like it will cause a lot of scalability issues.

revans2 · 2014-04-07T16:58:42Z

conf/defaults.yaml

@@ -38,7 +38,6 @@ storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"
 storm.messaging.transport: "backtype.storm.messaging.netty.Context"

 ### nimbus.* configs are for the master
-nimbus.host: "localhost"


if nimbus.host is not used any more we should either deprecate it or just remove it.

nimbus.host relative config should be removed all over the source codes. I will fix this.

ptgoetz · 2014-04-10T15:33:18Z

As @revans2 alluded to, some of the challenges with code distribution (supervisor downloads from nimbus) will be alleviated by using bittorrent for topology distribution.

I have a branch that switches to using bittorrent for code distribution, but I've held off on submitting a pull request because there's a bug with multi-lang topologies that I haven't had time to track down yet (the resource directory gets deleted).

I'll submit the pull request for reference with the caveat that it shouldn't be merged until that bug is fixed.

Here's the original pull request:

nathanmarz/storm#629

ptgoetz · 2014-04-11T13:34:04Z

FYI... aforementioned issue with multi-lang has been fixed.

d2r · 2014-07-10T21:13:57Z

@yveschina, any update on this PR? Looks like it needs an up-merge to master.

ptgoetz · 2014-07-24T00:47:37Z

@yveschina any update on the concerns raised?

Conflicts: storm-core/src/clj/backtype/storm/config.clj storm-core/src/clj/backtype/storm/thrift.clj storm-core/src/clj/backtype/storm/ui/core.clj

….git

yveschina · 2014-09-18T07:39:25Z

@d2r @ptgoetz I've updated the pullrequest according to the conversions with @revans2 . Should we consider to close this pullrequest ? Is there any plan to merge this pullrequest or other solutions on Nimbus HA?

ptgoetz · 2014-10-01T22:36:54Z

@yveschina The main concern I have is with catastrophic failure of a nimbus node during code distribution. I'm not sure it's acceptable to force users to resubmit a topology in that event.

I'm working with @Parth-Brahmbhatt on a similar solution that involves a pluggable code distribution interface (either bittorrent or a distributed FS) that will also be compatible with the security work being done (e.g. code distribution backed by a secure HDFS).

More details of that work are available in the JIRA, and we will be posting a much more detailed design doc in the future.

For the time being, let's keep this pull request open.

…utho Correct authorization check in nimbus methods

d2r · 2015-10-08T18:26:11Z

@yveschina @ptgoetz , it seems #354 has been merged for STORM-166. Should this pull request be closed? (Did we address everything this pull request fixes?)

ptgoetz · 2015-10-08T21:03:25Z

@d2r Yes, I think this pull request can be closed.

nimbus ha solution for issue STORM-166

4f12373

revans2 reviewed Apr 7, 2014
View reviewed changes

yveschina changed the title ~~nimbus ha solution for issue STORM-166~~ STORM-166:Nimbus HA solution based on Zookeeper Sep 11, 2014

yveschina added 7 commits September 11, 2014 16:54

add apache header and missing tips in ui

4a4176f

Merge remote-tracking branch 'apache/master'

91f3764

Conflicts: storm-core/src/clj/backtype/storm/config.clj storm-core/src/clj/backtype/storm/thrift.clj storm-core/src/clj/backtype/storm/ui/core.clj

change curator-recipse from netflix to apache

c7f47d7

rollback changes not related with nimbus ha

2cb3616

rollback changes not related with nimbus ha

a4b3b3f

add nimbus summary to ui

4812f28

Merge branch 'master' of https://github.com/yveschina/incubator-storm…

912cf59

….git

Merge remote-tracking branch 'apache/master'

68db8d0

knusbaum pushed a commit to knusbaum/incubator-storm that referenced this pull request Feb 11, 2015

Merge pull request apache#61 from derekd/derekd-fix-nimbus-topoconf-a…

41d8df7

…utho Correct authorization check in nimbus methods

asfgit closed this in 26f966c Oct 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-166:Nimbus HA solution based on Zookeeper #61

STORM-166:Nimbus HA solution based on Zookeeper #61

yveschina commented Apr 4, 2014

revans2 Apr 7, 2014

yveschina Apr 10, 2014

revans2 commented Apr 7, 2014

revans2 Apr 7, 2014

yveschina Apr 10, 2014

ptgoetz commented Apr 10, 2014

ptgoetz commented Apr 11, 2014

d2r commented Jul 10, 2014

ptgoetz commented Jul 24, 2014

yveschina commented Sep 18, 2014

ptgoetz commented Oct 1, 2014

d2r commented Oct 8, 2015

ptgoetz commented Oct 8, 2015

STORM-166:Nimbus HA solution based on Zookeeper #61

STORM-166:Nimbus HA solution based on Zookeeper #61

Conversation

yveschina commented Apr 4, 2014

revans2 Apr 7, 2014

Choose a reason for hiding this comment

yveschina Apr 10, 2014

Choose a reason for hiding this comment

revans2 commented Apr 7, 2014

revans2 Apr 7, 2014

Choose a reason for hiding this comment

yveschina Apr 10, 2014

Choose a reason for hiding this comment

ptgoetz commented Apr 10, 2014

ptgoetz commented Apr 11, 2014

d2r commented Jul 10, 2014

ptgoetz commented Jul 24, 2014

yveschina commented Sep 18, 2014

ptgoetz commented Oct 1, 2014

d2r commented Oct 8, 2015

ptgoetz commented Oct 8, 2015