Clone git repo to your home folder /home/vs_pj
git clone https://github.com/citmp2015/server_config.git
Download hadoop to home folder /home/vs_pj/
wget http://apache.mirror.digionline.de/hadoop/core/stable/hadoop-2.7.1.tar.gz
Extract them
tar xzf hadoop-2.7.1.tar.gz
Rename the folder
mv hadoop-2.7.1 hadoop
Overwrite the .bashrc file with the one from the git repo and also place the .bash_aliases in your home folder /home/vs_pj/
.
cp /home/vs_pj/server_config/.bashrc /home/vs_pj/
cp /home/vs_pj/server_config/.bash_aliases /home/vs_pj/
Reload the .bashrc
source /home/vs_pj/.bashrc
Copy all config files from the hadoop subfolder from the git repo to /home/vs_pj/hadoop/etc/hadoop
cp /home/vs_pj/server_config/hadoop/* /home/vs_pj/hadoop/etc/hadoop/
Create hadoop pid dir
mkdir /home/vs_pj/hadoop_pid
Change permission for the folder so that only the vs_pj user and group can write
chmod 770 /home/vs_pj/hadoop_pid
Create hadoop log dir
mkdir /home/vs_pj/hadoop_log
Create namenode dir on the root machine
mkdir /home/vs_pj/hadoop_namenode
Create datanode dir on the slaves machine
mkdir /home/vs_pj/hadoop_datanode
The slaves
config file must be modified by hand. Un-/Comment the slaves you want.
nano /home/vs_pj/hadoop/etc/hadoop/slaves
You have also to configure the core-site.xml
manually. Uncomment the master Namenode (asok05 or asok13)
nano /home/vs_pj/hadoop/etc/hadoop/core-site.xml
Fix some server pre-configs
This overwrite wrong JAVA_HOME dependency
cp /home/vs_pj/server_config/java.sh /etc/profile.d/java.sh
Before you can start Hadoop you have to format the HDFS. Attention: All data in your (not existing) HDFS will be lost! Only on your Namenode (asok05 or asok13)
$HADOOP_HOME/bin/hdfs namenode -format <clustername>
Finally start the Hadoop Namenode (asok05 or asok13)
$HADOOP_HOME/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
Start you Hadoop Datanodes (the other asoks)
$HADOOP_HOME/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
If you are done with that you can start all of the HDFS processes
$HADOOP_HOME/sbin/start-dfs.sh
Eventually you have to accept the ECDSA key fingerprint
during this process.
If you get the error message xxx.xxx.xxx.xxx JAVA_HOME is not set and could not be found.
make sure that in the /home/vs_pj/hadoop/etc/hadoop/hadoop-env.sh file you an export to the correct path of your JAVA_HOME.
If you want to stop HDFS
$HADOOP_HOME/sbin/stop-dfs.sh
Check you namenode (XY is the namenode server ip asok05 -> 34; asok13 -> 42)
http://130.149.248.XY:50070/dfshealth.html#tab-overview
Download flink to home folder /home/vs_pj/
wget http://apache.lauf-forum.at/flink/flink-0.9.1/flink-0.9.1-bin-hadoop27.tgz
Extract them
tar xzf flink-0.9.1-bin-hadoop27.tgz
Rename the folder
mv flink-0.9.1 flink
Copy all config files from the flink subfolder from the git repo to /home/vs_pj/flink/conf/
cp /home/vs_pj/server_config/flink/* /home/vs_pj/flink/conf/
The slaves
config file must be modified by hand. Un-/Comment the slaves you want.
nano /home/vs_pj/flink/conf/slaves
You have also to configure the flink-conf.yaml
config file by hand. Un-/Comment the jobmanager.rpc.address
you want in line 25 or line 28.
nano /home/vs_pj/flink/conf/flink-conf.yaml
Now the easy task
$FLINK_HOME/bin/start-cluster.sh
If you want to start the streaming mode
$FLINK_HOME/bin/start-cluster-streaming.sh
If you want to use the flink web-client you have to start them separately
$FLINK_HOME/bin/start-webclient.sh
Stop them with
$FLINK_HOME/bin/stop-webclient.sh
In addition to that you can stop flink
$FLINK_HOME/bin/stop-cluster.sh
And same for the streaming mode
$FLINK_HOME/bin/stop-cluster-streaming.sh
XY is the namenode server ip asok05 -> 34; asok13 -> 42
http://130.149.248.XY:8081
Check the web-client
http://130.149.248.XY:8080