Tweeter is a sample service that demonstrates how easy it is to run a Twitter-like service on DC/OS.
Capabilities:
- Stores tweets in Cassandra
- Streams tweets to Kafka as they come in
- Real time tweet analytics with Spark and Zeppelin
To run the demo script (cli_script.sh
), the following pieces of software are expected to be available:
- dcos cli
- cypress
Cypress is a nodejs package that executes UI tests. This is used by the demo script to submit a tweet and verify it is displayed within the Tweeter UI.
To install on OSX, perform the following from within this project's directory:
$ brew update
$ brew install node
$ npm install -g cypress-cli
$ cypress install
You can run the UI tests separately with cypress run
.
The demo script has a section that creates the JSON file ci-conf.json
. This file is read by the cypress to determine the URL of the DC/OS cluster, the URL of the tweeter application, and the log in credentials to use. Without this file the UI tests will fail.
Example:
{
"tweeter_url": "52.xx.xx.xx:10000",
"url": "http://my-cool-demo.us-west-2.elb.amazonaws.com/",
"username": "admin",
"password": "password"
}
You'll need a DC/OS cluster with one public node and at least five private nodes and the DC/OS CLI locally installed.
cli_script.sh
in this repository can be utilized to setup a tweeter demo
cluster automatically and provide some random traffic, however the GUI experience with Zeppelin analytics and DC/OS feature presentation must still be done by hand:
- Install Zeppelin from the GUI using the default values
- Log into the Tweeter UI at http://[elb_hostname] and post a sample tweet
- Start the tweeter load job from the CLI using the command dcos/bin/dcos marathon app add post-tweets.json
- Kill one of the Tweeter containers in Marathon and show that the Tweeter is still up and tweets are still flowing in
- Log into Zeppelin using the https interface at https://[master_ip]/service/zeppelin
- Click Import note and import tweeter-analytics.json from the Tweeter repo clone you made locally
- Open the newly loaded "Tweeter Analytics" Notebook
- Run the Load Dependencies step to load the required libraries into Zeppelin
- Run the Spark Streaming step, which reads the tweet stream from Zookeeper, and puts them into a temporary table that can be queried using SparkSQL - this spins up the Zeppelin spark context so you can show them the increased utilization on the dashboard
- Next, run the Top Tweeters SQL query, which counts the number of tweets per user, using the table created in the previous step
- The table updates continuously as new tweets come in, so re-running the query will produce a different result every time
Run Tweeter against an EE cluster, but do not start Zeppelin or post tweets
$ bash cli_script.sh --infra --url http://my.dcos.url
This combination of options will not actually run the demo but stop and prompt you with the appropriate CLI command to do to run the demo. Note, by specifying -oss without a DCOS_AUTH_TOKEN set, the dummy CI auth token will be used
$ bash cli_script.sh --step --manual --oss --url http://my.dcos.url
The steps below are applicable for Open DC/OS, when it does not have a super set. The auth token we set below
$ dcos auth login
Please go to the following link in your browser:
http://54.70.182.15/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob
Enter authentication token:
If you wish to access this token again later, use the cli command:
dcos config show core.dcos_acs_token
export DCOS_AUTH_TOKEN=<TOKEN_FROM_PREVIOUS_STEP>
$ ./cli_script.sh --oss --url http://52.70.182.15
$ dcos package install marathon-lb
$ dcos package install cassandra --cli
$ dcos package install kafka --cli
$ dcos package install zeppelin
Wait until the Kafka and Cassandra services are healthy. You can check their status with:
$ dcos kafka connection
...
$ dcos cassandra connection
...
Edit the HAPROXY_0_VHOST
label in tweeter.json
to match your public ELB hostname. Be sure to remove the leading http://
and the trailing /
For example:
{
"labels": {
"HAPROXY_GROUP": "external",
"HAPROXY_0_VHOST": "brenden-7-publicsl-1dnroe89snjkq-221614774.us-west-2.elb.amazonaws.com"
}
}
Launch three instances of Tweeter on Marathon using the config file in this repo:
$ dcos marathon app add tweeter.json
The service talks to Cassandra via node-0.cassandra.mesos:9042
, and Kafka via broker-0.kafka.mesos:9557
in this example.
Traffic is routed to the service via marathon-lb. Navigate to http://<public_elb>
to see the Tweeter UI and post a Tweet.
Post a lot of Shakespeare tweets from a file:
dcos marathon app add post-tweets.json
This will post more than 100k tweets one by one, so you'll see them coming in steadily when you refresh the page. Take a look at the Networking page on the UI to see the load balancing in action.
Next, we'll do real-time analytics on the stream of tweets coming in from Kafka.
Navigate to Zeppelin at https://<master_public_ip>/service/zeppelin/
, click Import note
and import tweeter-analytics.json
. Zeppelin is preconfigured to execute Spark jobs on the DCOS cluster, so there is no further configuration or setup required.
Run the Load Dependencies step to load the required libraries into Zeppelin. Next, run the Spark Streaming step, which reads the tweet stream from Zookeeper, and puts them into a temporary table that can be queried using SparkSQL. Next, run the Top Tweeters SQL query, which counts the number of tweets per user, using the table created in the previous step. The table updates continuously as new tweets come in, so re-running the query will produce a different result every time.
NOTE: if /service/zeppelin is showing as Disconnected (and hence can’t load the notebook), make sure you're using HTTPS instead of HTTP, until this PR gets merged. Alternatively, you can use marathon-lb. To do this, add the following labels to the Zeppelin service and restart:
HAPROXY_0_VHOST = [elb hostname]
HAPROXY_GROUP = external
You can get the ELB hostname from the CCM “Public Server” link. Once Zeppelin restarts, this should allow you to use that link to reach the Zeppelin GUI in “connected” mode.
You'll need Ruby and a couple of libraries on your local machine to hack on this service. If you just want to run the demo, you don't need this.
Using Homebrew, install rbenv
, a Ruby version manager:
$ brew update
$ brew install rbenv
Run this command and follow the instructions to setup your environment:
$ rbenv init
To install the required Ruby version for Tweeter, run from inside this repo:
$ rbenv install
Then install the Ruby package manager and Tweeter's dependencies. From this repo run:
$ gem install bundler
$ bundle install