折腾了挺久时间,把hadoop装好,简单记录下
注意事项:
1、测试时三台机器防火墙都关闭2、我这使用的是root帐号,如果非root帐号注意权限
3、mapred.xml.默认不存在,cp mapred-site.xml.template mapred-site.xml
4、hadoop配置文件使用配置好的主机名
IP地址 主机名 用途
192.168.20.197 hd1 namenode
192.168.20.193 hd2 datanode
192.168.20.195 hd3 datanode
一、系统设置
(所有步骤都需要在所有节点执行)
1. 修改主机名及ip地址解析
1) 修改主机名
# hostname hd1
# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hd1
2) 增加ip和主机映射
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.20.197 hd1
192.168.20.193 hd2
192.168.20.195 hd3
3) 验证是否成功
[root@hd1 ~]# ping hd2
PING hd2 (192.168.20.193) 56(84) bytes of data.
64 bytes from hd2 (192.168.20.193): icmp_seq=1 ttl=64 time=0.721 ms
64 bytes from hd2 (192.168.20.193): icmp_seq=1 ttl=63 time=1.25 ms (DUP!)
64 bytes from hd2 (192.168.20.193): icmp_seq=1 ttl=64 time=1.53 ms (DUP!)
64 bytes from hd2 (192.168.20.193): icmp_seq=1 ttl=63 time=2.79 ms (DUP!)
[root@hd1 ~]# ping hd3
PING hd3 (192.168.20.195) 56(84) bytes of data.
64 bytes from hd3 (192.168.20.195): icmp_seq=1 ttl=64 time=1.88 ms
64 bytes from hd3 (192.168.20.195): icmp_seq=1 ttl=63 time=2.18 ms (DUP!)
64 bytes from hd3 (192.168.20.195): icmp_seq=1 ttl=64 time=2.19 ms (DUP!)
64 bytes from hd3 (192.168.20.195): icmp_seq=1 ttl=63 time=2.19 ms (DUP!)
能ping通说明已经OK。
2. 关闭防火墙
# chkconfig iptables off
3. SSH免密码登陆可参考我另外文章: http://blog.csdn.net/nuli888/article/details/51924390
1) 生成密钥与公钥 登陆到hd1,把生成的id_rsa.pub(公钥)内容cat到authorized_keys文件中。同时登陆到hd2, hd3,生成id_rsa.pub,并把hd2, hd3各自的id_rsa.pub的内容copy到hd1中的authorzied_keys中。最后从hd1中scp到hd2, hd3的.ssh目录中
每台机子都输入下面命令
cd .ssh
ssh-keygen -t rsa
一直敲回车,执行完这个命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
把hd2和hd3的公钥拷贝到hd1上,方便合并
在hd2上执行
scp id_rsa.pub root@192.168.20.197:~/.ssh/id_rsa.pub_hd2
在hd3上执行
scp id_rsa.pub root@192.168.20.197:~/.ssh/id_rsa.pub_hd3
在hd1上执行
cat id_rsa.pub >> authorized_keys
cat id_rsa.pub_hd2 >> authorized_keys
cat id_rsa.pub_hd3 >> authorized_keys
2) scp authorized_keys到hd2, hd3
scp authorized_keys root@192.168.20.193:~/.ssh/
scp authorized_keys root@192.168.20.195:~/.ssh/
3) 验证ssh登陆是否是免密码
然后测试
ssh hd2
ssh hd3
第一次输入密码,后面就不需要了
[root@hd2 ~]# ssh hd2
Last login: Fri Jul 15 21:35:52 2016 from hd1
[root@hd2 ~]# ssh hd3
Last login: Fri Jul 15 21:10:54 2016 from 192.168.20.24
二、安装jdk、hadoop及设置环境变量
1. 下载jdk、hadoop安装包
jdk-7u79-linux-x64.tar.gz
官网的太慢,给个百度网盘下载地址:http://pan.baidu.com/share/link?shareid=2793927523&uk=1678158691&fid=117337971851932
hadoop-2.6.0.tar.gz下载地址: http://apache.fayea.com/hadoop/common/hadoop-2.6.0/
2. 解压
# tar jdk-7u79-linux-x64.tar.gz
# tar zxvf hadoop-2.60.tar.gz
# mv hadoop-2.6.0 /opt/hadoop-2.6.0
# mv jdk1.7.0_65 /opt/jdk1.7.0
以root用户登陆编辑/etc/profile,加入以下内容:
# vi /etc/profile
export JAVA_HOME=/opt/jdk1.7.0
export HADOOP_HOME=/opt/jdk1.7.0
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_ROOT_LOGGER=DEBUG,console
#source /etc/profile # 立即生效
4. 验证环境变量
# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
#hadoop
Usage: hadoop [--config confdir] COMMAND
三、hadoop集群设置
1. 修改hadoop配置文件
#cd /opt/hadoop-2.6.0/etc/hadoop
1) hadoop-env.sh、yarn-env.sh 设置JAVA_HOME环境变量
最开始以为已经在/etc/profile设置了JAVA_HOME,所以在hadoop-env.sh和yarn-env.sh中已经能成功获取到JAVA_HOME,所以就不用再设置了。最终发现这在hadoop-2.6.0中行不通,start-all.sh的时候出错了(hd1: Error: JAVA_HOME is not set and could not be found.)。
找到里面的JAVA_HOME,修改为实际路径 /opt/jdk1.7.0
2) slaves 配置
这个文件配置所有datanode节点,以便namenode搜索
#vi slaves
hd2
hd3
3) core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hd1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
4) hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hd1:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
5) mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>hd1:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hd1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hd1:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://hd1:9001</value>
</property>
</configuration>
6) yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hd1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hd1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hd1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hd1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hd1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hd1:8088</value>
</property>
</configuration>
到此:单机的server就算是配好了,单机这个时候就可以启动起来看看效果;
7) 先启动HDFS
[root@hd1 hadoop-2.6.0]# sbin/start-dfs.sh
16/07/15 21:12:07 DEBUG util.Shell: setsid exited with exit code 0
16/07/15 21:12:08 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
......
16/07/15 21:12:26 DEBUG security.UserGroupInformation: hadoop login
16/07/15 21:12:26 DEBUG security.UserGroupInformation: hadoop login commit
16/07/15 21:12:26 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: root
16/07/15 21:12:26 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: root" with name root
16/07/15 21:12:26 DEBUG security.UserGroupInformation: User entry: "root"
16/07/15 21:12:26 DEBUG security.UserGroupInformation: UGI loginUser:root (auth:SIMPLE)
16/07/15 21:12:26 DEBUG security.UserGroupInformation: PrivilegedAction as:root (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:314)
8) 再启动YARN
[root@hd1 hadoop-2.6.0]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.6.0/logs/yarn-root-resourcemanager-hd1.out
hd2: starting nodemanager, logging to /opt/hadoop-2.6.0/logs/yarn-root-nodemanager-hd2.out
hd3: starting nodemanager, logging to /opt/hadoop-2.6.0/logs/yarn-root-nodemanager-hd3.out
9) 验证是否启动成功
[root@hd1 ~]# jps
1716 Jps
1141 NameNode
1376 ResourceManager
10) 配置Hadoop的集群,把hd1虚拟机克隆出2个hd2,hd3
然后修改hd2的ip地址为192.168.20.193,hostname为hd2
然后修改hd2的ip地址为192.168.20.195,hostname为hd3
之后重启网络 service network restart
四、测试
1、启动整个集群及其验证:登录到master即hd1机器上# cd /opt/hadoop-2.6.0/sbin
# ./start-all.sh
2、可以通过浏览器访问了
http://192.168.20.197:8088/cluster/nodeshttp://192.168.20.197:50070/dfshealth.html#tab-overview