回想去年噩梦般的经历,刚出大学门,转行做IT,跟着潮流学了大数据,可惜碰到了个骗子+流氓+误人子弟的人叫杨勇,整天吹嘘,课程一窍不通,3个半月白白浪费掉。硬着头皮自学hadoop集群,写了不少文档,长进了自己的独立学习能力。进公司后,很幸运的转到了JavaWeb开发,从码畜慢慢往上爬,未来真正的去做大数据方面的研究。
最近想new一个自己,开始写写博客,记录点滴,希望能return给自己一个不会后悔的未来。今天第一天,我放上了当时自学劲头特足的CM和手动安装Hadoop集群。
通过CM安装CDH5
1、CM_cleanup_cluster.sh 清理集群(不是hadoop中的一个,只是为了实验用。)
2、配IP
2.1改主机名:
vi /etc/sysconfig/network
vi /etc/hosts
2.2 配hots:
elephant 192.168.20.3 elephant
monkey 192.168.20.4 monkey
tiger 192.168.20.5 tiger
lion 192.168.20.6 lion
horse 192.168.20.7 horse
3、安装hadoop:
3.1 Install_CDH5.sh
3.2 修改/etc/cloudera-scm-agent/config.ini如下:
server_host=localhost
server_host=lion
3.3 如果其他机子没有安装hadoop,那么需要运行如下命令:
copy_CM_agent_config.sh
4、仅lion做:
安装数据库:sudo yum --accesseyes install cloudera-manager-server-db-2
启动数据库:sudo service cloudera-scm-server-db-2 start
启动数据库服务:sudo service cloudera-scm-server start
查看数据库状态:sudo service cloudera-scm-server-db-2 status
查看数据库服务状态:sudo service cloudera-scm-server-db-2 status
5、启动客户端
sudo service cloudera-scm-agent start
查看状态:
sudo service cloudera-scm-agent status
查看日志:
sudo tail -f /var/log/cloudera-scm-agent/cloudera-scm-agent.log
6、打开浏览器进入:http://lion:7180 (用户名:admin 密码:admin)
手动安装hadoop cluster
此次安装一共分为5台机子,分别取名为elephant、horse、tiger、monkey、lion。严格按照下图所要求的进程分布安装(lion只需要安装DN和NM)
一、检查5台机子是否纯净
1、先ping通其他机子
2、安装前首先检查jps没有任何进程,如有进程则停止服务:
$ sudo service hadoop-hdfs-namenode stop
$ sudo service hadoop-hdfs-secondarynamenode stop
$ sudo service hadoop-hdfs-datanode stop
$ sudo service hadoop-yarn-resourcemanager stop
$ sudo service hadoop-yarn-nodemanager stop
$ sudo service hadoop-mapreduce-historyserver stop
3、检查日志,如果有删除日志
sudo ls /var/log/hadoop-*/*
sudo rm -rf /var/log/hadoop-*/*
4、首先查看有没有yum包
rpm -qa |grep yum
如果有,删除:
$ sudo yum remove --assumeyes hadoop-hdfs-secondarynamenode
$ sudo yum remove --assumeyes hadoop-yarn-resourcemanager
$ sudo yum remove --assumeyes hadoop-mapreduce-historyserver
二、在elephant上安装namenode
1、在elephant上安装namenode:
sudo yum install --assumeyes hadoop-hdfs-namenode
2、在根目录下创建如下目录:
sudo mkdir -p /hadoop-cluster/fsimage/nn1
sudo mkdir -p /hadoop-cluster/fsimage/nn2
3、安装完namenode之后,先将/training_materials/admin/stubs/里的配置文件复制到/etc/hadoop/conf中。然后配置core-site.xml,hdfs-site.xml,mapreduce-site.xml,yarn-site.xml如下:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://elephant:8020</value>
</property>
</configuration>
hdfs-site.xml:
首先在”/”目录下创建目录:
sudo mkdir -p /disk1/dfs/nn,sudo mkdir -p /disk2/dfs/nn
sudo mkdir -p /disk1/dfs/dn,sudo mkdir -p /disk2/dfs/dn
给disk1、disk2释放权限:
sudo chmod -R 1777 /disk1
sudo chmod -R 1777 /disk2
切换用户属组:
sudo chown -R hdfs:hadoop /disk1/dfs/
sudo chown -R hdfs:hadoop /disk2/dfs/
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///disk1/dfs/nn,file:///disk2/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.date.dir</name>
<value>file:///disk1/dfs/dn,file:///disk2/dfs/dn</value>
</property>
</configuration>
4、用hdfs用户格式化:sudo -u hdfs hdfs namenode -format
5、启动namenode
sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-namenode status
启动成功后,检查sudo jps,看启动进程中是否有NN。此时所有的分机都可以安装DN。
三、在horse上安装ResourceManager
1、在horse上安装resourcemanager:
sudo yum install --assumeyes hadoop-yarn-resourcemanager
2、安装完resourcemanager,将/training_materials/admin/stubs/里的配置文件复制到/etc/hadoop/conf中。然后配置core-site.xml,hdfs-site.xml,mapreduce-site.xml,yarn-site.xml如下:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://elephant:8020</value>
</property>
</configuration>
mapred-site.xml
vim /etc/hadoop/conf/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>monkey:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>monkey:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
yarn-site.xml
配置之前,首先在“/”目录下创建如下目录:
sudo mkdir -p /disk1/nodemgr/local
sudo mkdir -p /disk2/nodemgr/local
sudo mkdir -p /var/log/hadoop-yarn/containers
sudo mkdir -p /var/log/hadoop-yarn/apps
给目录释放权限:
sudo chmod -R 1777 /disk1/nodemgr/local
sudo chmod -R 1777 /disk2/nodemgr/local
sudo chmod -R 1777 /var/log/hadoop-yarn/containers
sudo chmod -R 1777 /var/log/hadoop-yarn/apps
给目录切换用户属组:
sudo chown -R yarn:yarn /disk1/nodemgr/
sudo chown -R yarn:yarn /disk2/nodemgr/
sudo chown -R yarn:yarn /var/log/hadoop-yarn/containers
sudo chown -R yarn:yarn /var/log/hadoop-yarn/apps
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>horse</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///disk1/nodemgr/local,file:///disk2/nodemgr/l ocal</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
3、启动服务
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-resourcemanager status
启动成功后,检查sudo jps,看启动进程中是否有RM。此时所有的分机都可以安装NM。
四、在tiger上安装SecondaryNamenode
1、在elephant上安装SecondaryNamenode:
sudo yum install --assumeyes hadoop-hdfs-secondarynamenode
2、安装完secondarynamenode,将/training_materials/admin/stubs/里的配置文件复制到/etc/hadoop/conf中。然后配置core-site.xml,hdfs-site.xml,mapreduce-site.xml,yarn-site.xml如下:
core-site.xml:
<configuration>
<properties>
<name>fs.defaultFS</name>
<value>hdfs://elephant:8020</value>
</properties>
</configuration>
hdfs-site.xml:
首先在”/”目录下创建目录:
sudo mkdir -p /disk1/dfs/nn,sudo mkdir -p /disk2/dfs/nn
sudo mkdir -p /disk1/dfs/dn,sudo mkdir -p /disk2/dfs/dn
给disk1、disk2释放权限:
sudo chmod -R 1777 /disk1
sudo chmod -R 1777 /disk2
切换用户属组:
sudo chown -R hdfs:hadoop /disk1/dfs/
sudo chown -R hdfs:hadoop /disk2/dfs/
<configuration>
<properties>
<name>dfs.namenode.name.dir</name>
<value>file:///disk1/dfs/nn,file:///disk2/dfs/nn</value>
</properties>
<properties>
<name>dfs.datanode.date.dir</name>
<value>file:///disk1/dfs/dn,file:///disk2/dfs/dn</value>
</properties>
</configuration>
3、启动secondarynamenode
sudo service hadoop-hdfs-secondarynamenode start
sudo service hadoop-hdfs-secondarynamenode status
启动成功后,检查sudo jps,看启动进程中SecondaryNamennode是否启动。
五、在monkey上安装JobHistoryServer
启动JHS之前,需要先在HDFS上建立如下文件:
1、安装JobHistoryServer:
sudo yum install --assumeyes hadoop-mapreduce-historyserver
2、安装完historyserver,将/training_materials/admin/stubs/里的配置文件复制到/etc/hadoop/conf中。然后配置core-site.xml,hdfs-site.xml,mapreduce-site.xml,yarn-site.xml如下:
core-site.xml:
<configuration>
<properties>
<name>fs.defaultFS</name>
<value>hdfs://elephant:8020</value>
</properties>
</configuration>
hdfs-site.xml:
首先在”/”目录下创建目录:
sudo mkdir -p /disk1/dfs/nn,sudo mkdir -p /disk2/dfs/nn
sudo mkdir -p /disk1/dfs/dn,sudo mkdir -p /disk2/dfs/dn
给disk1、disk2释放权限:
sudo chmod -R 1777 /disk1
sudo chmod -R 1777 /disk2
切换用户属组:
sudo chown -R hdfs:hadoop /disk1/dfs/
sudo chown -R hdfs:hadoop /disk2/dfs/
<configuration>
<properties>
<name>dfs.namenode.name.dir</name>
<value>file:///disk1/dfs/nn,file:///disk2/dfs/nn</value>
</properties>
<properties>
<name>dfs.datanode.date.dir</name>
<value>file:///disk1/dfs/dn,file:///disk2/dfs/dn</value>
</properties>
</configuration>
3、启动historyserver
sudo service hadoop-mapreduce-historyserver start
sudo service hadoop-mapreduce-historyserver status
启动成功后,检查sudo jps,看启动进程中JobHistoryserver是否启动。
六、在集群各台机子上安装datenode,nodemanager
重要:在启动nodemanager之前,请务必先运行mapreduce!!!!!!!
1、安装datanode
需要注意的是,需要配置core-site.xml,hdfs-site.xml。在上述配置过程中,elephant、tiger、monkey都已经配置完成,所以需要horse、lion进行相同的配置,然后安装并启动:
sudo yum install --assumeyes hadoop-hdfs-datanode
sudo service hadoop-hdfs-datanode start
sudo service hadoop-hdfs-datanode status
2、安装nodemanager
需要注意的是,需要配置mapreduce-site.xml和yarn-site.xml,配置过程中只有horse进行了配置,所以elephant、tiger、monkey、lion都需要按照horse配置,然后进行安装和启动:
sudo yum install --assumeyes hadoop-yarn-nodemanager
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-yarn-nodemanager status
3、运行mapreduce
sudo yum install --assumeyes hadoop-mapreduce
到此,hadoop cluster全部安装完毕,可以进行相关测试。
在配置集群的过程中,会出现一些错误,可以查看/var/log下的运行日志。