初探hadoop,hadoop大数据平台初次搭建经验

时间:2022-02-22 21:24:32

前言

      初到新公司, 一个在运维岗位工作十年的老前辈然后交代给我一个工作,给我了一个文档和四台虚拟机,然后让我搭建hadoop平台; 我当时就感觉到压力山大,不过还好有文档, 好吧硬着头皮接下来吧 ! 然后就长达一周尝试搭建、搭建、排错、周而复始;终于咋今天可正常上传文件了,但是还存在着一定的问题,但是还是先记录下来吧! 

大致架构描述:

        

问题列举

       主要是我在尝试安装过程中遇到的各种常用问题和犯得一些错误,这里写下来引以为戒, 如果下次搭建,一定要留意这些问题。

1: 主机名问题,第一次搭建的时候主机名我只是简单写了ip-->域名,并没有写网络识别的主机名,对后面的搭建是造成了一定的麻烦,第二次写的如下

[bigdata@namenode01 ~]$  cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.22.3.110 namenode1-sit.qq.com namenode01
10.22.3.111 namenode2-sit.qq.com namenode02
10.22.4.110 datanode1-sit.qq.com datanode01
10.22.4.111 datanode2-sit.qq.com datanode02

2: 权限问题,在上传软件包的时候,并没有注意用户的权限一类的东西, 导致权限,几乎全部乱掉。整个集群四台机器上上面,这么乱的权限,就别想能把服务启起来,所以谨记权限问题,在谁的家目录,就是谁的own权限。

3:针对zookeeper三集群设置,最后设置myid的一定要设置正确,如果数字一不小心写成一样的,这就尴尬了,因为这个zookeeper的配置文件信息少,基本上一目十行了,检查的时候,会十分麻烦。

4:针对自定义的那个字段,最好在hdfs-site.xml 多排查几遍,因为这个配置文件里面有很多处出现了自定义字段,所以需要多多注意。

5:对于多次格式化导致 cid 不对的,修改 下cid ,然后重启就行 再那边找cid ,一般去hdfs的配置文件里面有。current/VERSION 

6:blockpool id 不匹配的,我真的不知道该怎么处理,然后datanode日志上面一直在报错,让我很蛋疼;以后提高以后在处理这个问题吧!

收获

       虽然忙碌了一星期(我承认大部分时间都是在玩),但是还是有些收获的, 前辈告诉我,在无密码登陆时,一对公私密钥,可以在很多用在多个账户下实现无密码登录,然后我就查了一些资料, 然后生成一对公私钥,然后别分上传到每个用户的~/.ssh文件夹下,(注意这些文件的权限,600 ,否者就不能用),作为登录机 ,要保留私钥, 被登录机上传公钥到然后改名成authorized_keys, 如果存在这个文件,就cat  id_rsa.pub >authorized_keys,这样排错也好排错,管理起来也方便很多。   然后最大的收获就是对hadoop的整体架构有了了解。好了不说那么多废话了,下面开始真刀实枪的干一场。

搭建步骤

1.1 首先更新各个节点上的hosts文件, 如开头所示,

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.22.3.110 namenode1-sit.qq.com namenode01
10.22.3.111 namenode2-sit.qq.com namenode02
10.22.4.110 datanode1-sit.qq.com datanode01
10.22.4.111 datanode2-sit.qq.com datanode02

1.2 我看很多文档都修改了这个编码,将英文编码改成了中文编码,(包括前辈给的资料,但是我感觉没有什么用,所以就没有修改,)修改字符集 将“en_US.UTF-8”改为"en_zh.UTF-8",反正我没改剩下的你们看

cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

1.3 关闭防火墙和selinux, 这些原因,我就不多说了,都知道原因 

vi /etc/selinux/config   
修改:
SELINUX=disabled

chkconfig iptables off
service iptables stop
service iptables status

1.4 关闭abrtd  这个服务说实话 不是太清楚是搞什么用的,但是还是关掉为好

chkconfig abrtd off
service abrtd stop
service abrtd status
1.6 关闭coredump 首先检测下这个这个是否打开,如果没有打开就不用理睬 
ulimit -c  # 如果是输出为0 就表明这个coredump没有开启, 
ulimit -S -c 0 #关闭

1.5 对四台机器放开资源,其实作为测试环境,不会有那么大的流量的和资源占用的,这个其实完全可以忽略,但是生产环境就不能这样马虎了,所以还是照做吧! 

1.5.1 修改https://cp.launchvps.com如下所示  

[bigdata@datanode02 ~]$ cat /etc/security/limits.conf
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain> <type> <item> <value>
#
#Where:
#<domain> can be:
# - an user name
# - a group name, with @group syntax
# - the wildcard *, for default entry
# - the wildcard %, can be also used with %group syntax,
# for maxlogin limit
#
#<type> can have the two values:
# - "soft" for enforcing the soft limits
# - "hard" for enforcing hard limits
#
#<item> can be one of the following:
# - core - limits the core file size (KB)
# - data - max data size (KB)
# - fsize - maximum filesize (KB)
# - memlock - max locked-in-memory address space (KB)
# - nofile - max number of open files
# - rss - max resident set size (KB)
# - stack - max stack size (KB)
# - cpu - max CPU time (MIN)
# - nproc - max number of processes
# - as - address space limit (KB)
# - maxlogins - max number of logins for this user
# - maxsyslogins - max number of logins on the system
# - priority - the priority to run user process with
# - locks - max number of file locks the user can hold
# - sigpending - max number of pending signals
# - msgqueue - max memory used by POSIX message queues (bytes)
# - nice - max nice priority allowed to raise to values: [-20, 19]
# - rtprio - max realtime priority
#
#<domain> <type> <item> <value>
#

#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0
#@student - maxlogins 4
#* soft nofile 10240
#* hard nofile 10240
#* soft nproc 11000
#* hard nproc 11000
# End of file
* - nproc 20480
* - nofile 32768
1.5.2 修改/etc/security/limits.d/90-nproc.conf  如下图所示
[bigdata@datanode02 ~]$ cat /etc/security/limits.d/90-nproc.conf# Default limit for number of user's processes to prevent# accidental fork bombs.# See rhbz #432903 for reasoning.*       -    nproc    20480*       -    nofile    32768
1.6 修改系统内核参数/etc/sysctl.conf 如下
[bigdata@datanode02 ~]$ cat /etc/sysctl.conf# Kernel sysctl configuration file for Red Hat Linux## For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and# sysctl.conf(5) for more details.# Controls IP packet forwardingnet.ipv4.ip_forward = 0# Controls source route verificationnet.ipv4.conf.default.rp_filter = 1# Do not accept source routingnet.ipv4.conf.default.accept_source_route = 0# Controls the System Request debugging functionality of the kernelkernel.sysrq = 0# Controls whether core dumps will append the PID to the core filename.# Useful for debugging multi-threaded applications.kernel.core_uses_pid = 1# Controls the use of TCP syncookiesnet.ipv4.tcp_syncookies = 1# Disable netfilter on bridges.net.bridge.bridge-nf-call-ip6tables = 0net.bridge.bridge-nf-call-iptables = 0net.bridge.bridge-nf-call-arptables = 0# Controls the default maxmimum size of a mesage queuekernel.msgmnb = 65536# Controls the maximum size of a message, in byteskernel.msgmax = 65536# Controls the maximum shared segment size, in byteskernel.shmmax = 68719476736# Controls the maximum number of shared memory segments, in pageskernel.shmall = 4294967296net.ipv4.ip_local_reserved_ports = 2181,2888,3772-3773,3888,6627,7000,8000,8021,8030-8033,8088-8089,8360,9000,9010-9011,9090,9160,9999,10009,10101-10104,11469,21469,24464,50010,50020,50030,50060,50070,50075,50090,60000,60010,60020,60030net.ipv4.ip_local_port_range = 10000 65000net.ipv4.tcp_tw_recycle = 1net.ipv4.tcp_tw_reuse = 1net.core.somaxconn = 32768vm.swappiness = 0vm.overcommit_memory = 1
1.7 内存优化  如下 

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
echo 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag' >> /etc/rc.local

1.8 必要的软件包安装

yum install gcc python perl php make   smartmontools  iotop -y

 

到此 各个节点系统优化已经结束,(注意以上优化,每个节点都需要做,否则会有问题,)

2 搭建前准备

  软件准备准备jdk1.7  oracle官网可以下载  zookeeper-3.4.9.tar.gz  hadoop-2.7.3.tar.gz  这些都可以在网管下载。 

2.1 创建用户

用户名

组名

所在机器

用途

zookeeper(501)

zookeeper(501)

zk1, zk2, zk3

ZK

bigdata(500)

bigdata(500)

all

HDFS(NN, DN, JN)

yarn(600)

yarn(600)

all

YARN(RM, NM)

对应命令,创建用户组,然后创建用户,

# groupadd bigdata -g 500 && useradd -g bigdata -u 500 bigdata
2.2 设置无密码登录,  bigdata   zookeeper  yarn 等用户做无密码登录 (核心需求,两个namenode可以相互登录,namenode可以登录其他datanode) 这边使用一对公私钥密钥进行设置,如果具体的一个台机器设置会比较麻烦,然后就通过拷贝公私钥的方式做加密文档,
$ ssh-keygen #生成公私钥对,其后面全部回车$ cd ~/.ssh/   #切换到存储公私钥的地方,查看生成的公私钥$ lsid_rsa  id_rsa.pub  known_hosts # id_rsa 为私钥 id_rsa.pub 为公钥,$ cp -rp id_rsa.pub authorized_keys # 将公钥写成保管公钥的文件, $ ll total 16-rw-r--r-- 1 bigdata bigdata  402 Apr 24 15:08 authorized_keys-rw------- 1 bigdata bigdata 1675 Apr 24 14:30 id_rsa-rw-r--r-- 1 bigdata bigdata  402 Apr 24 14:30 id_rsa.pub-rw-r--r-- 1 bigdata bigdata 1997 Apr 24 16:07 known_hostsll -ddrwx------ 2 bigdata bigdata 4096 Apr 24 15:09 .  $  cd ..$ tar zcvf ss.tar.gz .ssh/  #讲此类文件打包 $ scp ss.tar.gz  namenode02: #将包传到namenode02 的家目录在namenode02 上解压打开 权限是否正确 [bigdata@namenode02 opt]$ tar zxvf ss.tar.gz .ssh/.ssh/id_rsa.ssh/known_hosts.ssh/id_rsa.pub.ssh/authorized_keys[bigdata@namenode02 opt]$ cd .ssh/[bigdata@namenode02 .ssh]$ lsauthorized_keys  id_rsa  id_rsa.pub  known_hosts[bigdata@namenode02 .ssh]$ lltotal 16-rw-r--r-- 1 bigdata bigdata  402 Apr 24 15:08 authorized_keys-rw------- 1 bigdata bigdata 1675 Apr 24 14:30 id_rsa-rw-r--r-- 1 bigdata bigdata  402 Apr 24 14:30 id_rsa.pub-rw-r--r-- 1 bigdata bigdata 1997 Apr 24 16:07 known_hosts[bigdata@namenode02 .ssh]$ ll -d drwx------ 2 bigdata bigdata 4096 Apr 24 15:09 .然后测试 是否能否namenode01 he namenode02是否能够无密码登录, [bigdata@namenode02 .ssh]$ ssh namenode01Last login: Fri Apr 28 19:36:27 2017 from namenode1-sit.qq.com[bigdata@namenode01 ~]$  然后讲文件的私钥移出目录,再次打包,按上面步骤,依次分发,然后测试,
最后将namenode02移出的私钥再移入~/.ssh 中即可

3  安装包的分发和配置修改

3.1 zookeeper的安装和修改,

       zookeeper 是三节点HA的,所有我们需要在namenode01 namenode02 和datanode01三个节点上部署,因为三个节点是一样的,所以在一个节点上配置完,然后分发给其他节点即可。

3.1.1 台ZK机器上的zookeeper用户主目录(/home/zookeeper)下创建software目录。解压JDK和Zookeeper包至software下,并创建相应的软链接。如下图所示

[zookeeper@namenode01 ~]$ cd software
[zookeeper@namenode01 software]$ ll
total 261244
lrwxrwxrwx 1 zookeeper zookeeper 11 Dec 9 17:11 java -> jdk1.7.0_60
drwxr-xr-x 8 zookeeper zookeeper 4096 Mar 28 2016 jdk1.7.0_60
-rw-rw-r-- 1 zookeeper zookeeper 244776960 Oct 10 2016 jdk1.7.0_60.tar
lrwxrwxrwx 1 zookeeper zookeeper 15 Dec 9 17:09 zookeeper -> zookeeper-3.4.9
drwxr-xr-x 10 zookeeper zookeeper 4096 Aug 23 2016 zookeeper-3.4.9
-rw-rw-r-- 1 zookeeper zookeeper 22724574 Dec 8 16:00 zookeeper-3.4.9.tar.gz
[zookeeper@namenode01 software]$
3.1.2 环境变量配置,zookeeper用户的.bashrc文件尾追加以下内容(编辑环境变量的时候一定要在当前的用户的家目录)
export JAVA_HOME=/home/zookeeper/software/javaexport CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:.export ZOOKEEPER_HOME=/home/zookeeper/software/zookeeperexport PATH=$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH

3.1.3 配置配置文件如下图所示

[zookeeper@namenode01 ~]$ cat ~/software/zookeeper/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper/myid
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=namenode1-sit.qq.com:2888:3888
server.2=namenode2-sit.qq.com:2888:3888
server.3=datanode1-sit.qq.com:2888:3888
3.14 将软件打包,然后分发到其他节点 然后再其他节点解压 ,检查权限。
tar czvf software.tar.gz software/ scp software.tar.gz  namenode01:scp .bashrc  namenode01:  

3.1.5 建立dataDir=/home/zookeeper/myid 按照server id 分别写myid文件 

[zookeeper@namenode01 ~]$ pwd
/home/zookeeper
[zookeeper@namenode01 ~]$ mkdir myid
vim myid/myid

在namenode01的myid上写入数字1 ;namenode02 和datanode01写入数字2和3即可, 然后zookeeper就算是配置完成,

4 hdfs的安装和配置,hdfs 是hadoop的的重要一环,

4.1安装包分发 跟zookeeper的文件目录结构一样 如下所示

[bigdata@namenode01 software]$ ll 
total 452484
lrwxrwxrwx 1 bigdata bigdata 12 Dec 9 11:17 hadoop -> hadoop-2.7.3
drwxr-xr-x 13 bigdata bigdata 4096 Apr 26 10:57 hadoop-2.7.3
-rw-rw-r-- 1 bigdata bigdata 214092195 Dec 8 16:00 hadoop-2.7.3.tar.gz
-rw-r--r-- 1 bigdata bigdata 4464640 Apr 26 09:23 hadoop-native-64-2.7.0.tar
lrwxrwxrwx 1 bigdata bigdata 11 Dec 9 11:17 java -> jdk1.7.0_60
drwxr-xr-x 8 bigdata bigdata 4096 Mar 28 2016 jdk1.7.0_60
-rw-rw-r-- 1 bigdata bigdata 244776960 Oct 10 2016 jdk1.7.0_60.tar
4.2 环境变量配置 如下所示
[bigdata@namenode01 ~]$  cat .bashrc # .bashrc# Source global definitionsif [ -f /etc/bashrc ]; then	. /etc/bashrcfi# User specific aliases and functionsexport HADOOP_HOME=/home/bigdata/software/hadoopexport JAVA_HOME=/home/bigdata/software/javaexport HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"export PATH=$HADOOP_HOME/bin/:$HADOOP_HOME/sbin/:$JAVA_HOME/bin/:$PATH

4.3  hdfs core-site.xml文件配置 如下 其中标背景色的会影响后面的配置 

[bigdata@namenode01 ~]$ cat /home/bigdata/software/hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://suninglpc</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>namenode01:2181,namenode02:2181,datanode01:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/bigdata/software/hadoop/tmp</value>
<final>true</final>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<final>true</final>
</property>
<property>
<name>fs.trash.interval</name>
<value>720</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>hadoop.proxyuser.httpfsuser.hosts</name>
<value>datanode01</value>
</property>
<property>
<name>hadoop.proxyuser.httpfsuser.groups</name>
<value>bigdata</value>
</property>
<property>
<name>hadoop.proxyuser.flume.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.flume.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.alluxio.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.alluxio.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
</configuration>

4.4hdfs文件配置,其中name弄得和datanode虽然相差很小,但是数据部分还是有修改的,namenode节点的hdfs配置文件如下(其中相差了一个注释,所以可以在namenode借点配置完发送到datanode节点的时候在修改这个备注。

[bigdata@namenode01 ~]$ cat software/hadoop/etc/hadoop/hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.nameservices</name>
<value>suninglpc</value>
</property>
<property>
<name>dfs.ha.namenodes.suninglpc</name>
<value>nn1,nn2</value>
</property>


<property>
<name>dfs.namenode.rpc-address.suninglpc.nn1</name>
<value>namenode01:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.suninglpc.nn2</name>
<value>namenode02:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.suninglpc.nn1</name>
<value>namenode01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.suninglpc.nn2</name>
<value>namenode02:50070</value>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://namenode01:8485;namenode02:8485;datanode01:8485/suninglpc</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.suninglpc</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/bigdata/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/bigdata/software/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.heartbeat.interval</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>/home/bigdata/software/hadoop/namenode</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.
</description>
<final>true</final>
</property>
<!--
<property>
<name>dfs.data.dir</name>
<value>/data0/hdfs,/data1/hdfs</value>
<final>true</final>
</property>
-->

<property>
<name>dfs.namenode.handler.count</name>
<value>40</value>
<final>true</final>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.hosts.exclude</name>
<value>/home/bigdata/software/hadoop/etc/hadoop/exclude</value>
</property>


<property>
<name>dfs.blockreport.initialDelay</name>
<value>6</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

 数据节点的hdfs的配置,其中需要两个配置文件对比的,已经用用背景标色。

[bigdata@datanode01 ~]$ cat software/hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.nameservices</name>
<value>suninglpc</value>
</property>
<property>
<name>dfs.ha.namenodes.suninglpc</name>
<value>nn1,nn2</value>
</property>


<property>
<name>dfs.namenode.rpc-address.suninglpc.nn1</name>
<value>namenode01:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.suninglpc.nn2</name>
<value>namenode02:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.suninglpc.nn1</name>
<value>namenode01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.suninglpc.nn2</name>
<value>namenode02:50070</value>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://namenode01:8485;namenode02:8485;datanode01:8485/suninglpc</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.suninglpc</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/bigdata/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/bigdata/software/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.heartbeat.interval</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>/home/bigdata/software/hadoop/namenode</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.
</description>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data0/hdfs,/data1/hdfs</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>40</value>
<final>true</final>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.hosts.exclude</name>
<value>/home/bigdata/software/hadoop/etc/hadoop/exclude</value>
</property>


<property>
<name>dfs.blockreport.initialDelay</name>
<value>6</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
 

4.5 在namenode01上配置完以上操作之后,将上述的软件全部打包 发送的其他节点,解压数据包,检查权限,其中数据节点hdfs配置文件将注释取消, 而且配置hdfs时候,需要在对应的数据节点上创建好/data0/hdfs /data1/hdfs  而且权限要给bigdata , 否者启动的时候权限不够,照样不通过。

tar czvf software.tar.gz  software
scp software.tar.gz namenode01:
scp .bashrc namenode01:
 

5 yarn 配置和分发 

5.1 参照3.1那样分发包然后建立软连接如下如所示 

[yarn@namenode01 software]$ ll
total 305164
lrwxrwxrwx 1 yarn yarn 12 Apr 24 16:35 hadoop -> hadoop-2.7.3
drwxr-xr-x 10 yarn yarn 4096 Apr 25 08:56 hadoop-2.7.3
-rw-r--r-- 1 root root 214092195 Apr 24 16:32 hadoop-2.7.3.tar.gz
lrwxrwxrwx 1 yarn yarn 11 Apr 24 16:34 java -> jdk1.7.0_60
drwxr-xr-x 8 yarn yarn 4096 Mar 28 2016 jdk1.7.0_60
-rw-r--r-- 1 root root 98381598 Apr 24 16:32 jdk1.7.0_60.tar.gz
[yarn@namenode01 software]$
5.2 编辑环境变量文件 如下图所示  

[yarn@namenode01 ~]$  vim .bashrc 

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

# User specific aliases and functions
export HADOOP_HOME=/home/yarn/software/hadoop
export JAVA_HOME=/home/yarn/software/java
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$HADOOP_HOME/bin/:$HADOOP_HOME/sbin/:$JAVA_HOME/bin/:$PATH
5.3  配置yarn yarn所管控的resourcemaanger 只在namenode01和namenode02 节点上运行,但是nodemanager实在所有节点上都有的,所以还是先配置一台,其他的节点分发把!

5.3.1 配置/home/yarn/software/hadoop/etc/hadoop/yarn-site.xml 如下所示 其中背景标色的要注意

[yarn@namenode01 ~]$ cat /home/yarn/software/hadoop/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>suninglpc</value>
</property>

<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>namenode1-sit.qq.com</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>namenode2-sit.qq.com</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>namenode1-sit.qq.com:2181,namenode2-sit.qq.com:2181,datanode1-site.com:2181</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data0/yarn, /data1/yarn</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data0/logs, /data1/logs</value>
</property>

<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>

<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>namenode1-sit.qq.com:23140</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>namenode1-sit.qq.com:23130</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>namenode1-sit.qq.com:8088</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>namenode1-sit.qq.com:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>namenode1-sit.qq.com:23141</value>
</property>

<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>namenode1-sit.qq.com:23142</value>
</property>

<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>namenode2-sit.qq.com:23140</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>namenode2-sit.qq.com:23130</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>namenode2-sit.qq.com:8088</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>namenode2-sit.qq.com:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>namenode2-sit.qq.com:23141</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>namenode2-sit.qq.com:23142</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>1814400</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>65536</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1500</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>65536</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/yarn/software/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>

</configuration>
5.3.2 配置/home/yarn/software/hadoop/etc/hadoop/mapred-site.xml 如下所示 其中需要修改的地方不多 

[yarn@namenode01 ~]$ cat /home/yarn/software/hadoop/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/home/yarn/software/hadoop/lib/native/libhadoop.so</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>namenode1-sit.qq.com:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>namenode1-sit.qq.com:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/yarn/mapreduce/history/done</value>
</property>

<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/yarn/mapreduce/history/done_intermediate</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1500</value>
<description>每个Map任务的物理内存限制</description>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3000</value>
<description>每个Reduce任务的物理内存限制</description>
</property>

<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio=6 -XX:MaxPermSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/yarn/hadoop/logs</value>
</property>


<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio=6 -XX:MaxPermSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/yarn/hadoop/logs</value>
</property>


<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1280m -Xms1280m -Xmn256m -XX:SurvivorRatio=6 -XX:MaxPermSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/yarn/hadoop/logs</value>
<final>true</final>
</property>

<property>
<name>mapreduce.job.counters.max</name>
<value>1000</value>
</property>
<!--property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_HOME/etc/hadoop,$HADOOP_HOME/share/hadoop/common/*,$HADOOP_HOME/share/hadoop/common/lib/*,$HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*,$HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*</value>
</property-->
</configuration>

其中 下面节点要根据节点修改为本节点域名
 <name>mapreduce.jobhistory.address</name>
<value>namenode1-sit.qq.com:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>namenode1-sit.qq.com:19888</value>

5.4 参照4.5 将配置和软件包打包,分发,然后针对节点进行修改

tar czvf software.tar.gz  software/
scp software.tar.gz namenode01:
scp .bashrc namenode01:
ssh namenode02
vim /home/yarn/software/hadoop/etc/hadoop/mapred-site.xml

6 启动服务 

其中之前首先对系统检查

6.1.1 免密登录,这个需求只要是两个namenode可以相互登录,然后任意一个namenode可以登录到其他节点上就行,这个手动测试吧,现在还不是很多节点

6.1.2 系统文件,打开所有文件,然后计算下md5 值

cat /etc/sysconfig/i18n /etc/profile /etc/security/limits.conf /etc/security/limits.d/90-nproc.conf /etc/sysctl.conf /etc/pam.d/su /sys/kernel/mm/redhat_transparent_hugepage/defrag /etc/rc.local | md5sum
6.1.3 软件和服务是否关闭 防火墙   abrtd  
for i in iptables  abrtd;do echo $i && service $i status;donefor i in gcc python perl php make;do echo -n "$i $(rpm -qa | grep $i | wc -l) "; which $i;done
6.1.4 硬盘检查

df -h
ll /data*
6.2 启动zookeeper  启动服务均需要对应的账户来启动, 启动之前检查software/myid/myid 文件里面的写的1,2,3是否存在正确 然后在三个节点启动zookeeper  命令如下

$ zkServer.sh start
$ zkServer.sh status

若有一个节点状态为leader,其他为follower,说明启动成功。

6.3 
默认认为namenode01为active namenode  

6.3.1 在nn1上初始化zk目录 

$ hdfs zkfc –formatZK 
6.3.2 在zk1 zk2 zk3 上启动journanode服务

$ hadoop-daemon.sh start journalnode
6.3.3在nn1上格式化hdfs  并启动namenode01,

hadoop namenode -format suninglpc
hadoop-daemon.sh start namenode
其中着色部分,要跟hdfs-site.xml中的配置一样 (这个是自定义字符)

6.3.4 在nn2上执行如下命令,拷贝nn1的元数据,然后启动namenode02

hdfs namenode –bootstrapStandby
hadoop-daemon.sh start namenode
6.3.5 确认namenode01和namenode02的状态,如果均是standby  则在namenode上启动ZKFC,若有意选择nn1作为初始active NN,则先在nn1上启ZKFC,再在nn2上启。
hdfs haadmin -getServiceState nn1hdfs haadmin -getServiceState nn2hadoop-daemon.sh start zkfc
6.3.6 启动datanode 

hadoop-daemon.sh start datanode
6.3.7 创建目录测试  
hadoop fs -mkdir -p /user/bigdatahadoop fs -mkdir /tmp/hadoop fs -chmod 777 /tmp/

6.4 启动Yarn 

6.4.1启动两个rm 在两个RM上执行 如下代码 然后确定状态,

yarn-daemon.sh start resourcemanager
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2

6.4.2如果正常,启动namemanager,四个节点均有,

yarn-daemon.sh start nodemanager
6.4.3 在namenode01上启动 JHS 

mr-jobhistory-daemon.sh start historyserver

到期,搭建结束,其实到6.3.7 测试能否创建文件夹,已经算是在测试节点是否能用,最后的yarn 纯属上层资源管理和分配的组件。

6.5 如果创建过程中,出现任何问题直接查 hadoop/logs/*  里面的日志,这样排查起来就有理可依。