Hadoop2.6.0集群搭建

时间:2023-01-22 14:30:21

先申明本人的安装环境:CentOS6.7,Hadoop2.6,jdk 1.7


Hadoop2相比较于Hadoop1.x来说,HDFS的架构与MapReduce的都有较大的变化,且速度上和可用性上都有了很大的提高,

Hadoop新版2.6.0采用的了新的 map-reduce 框架(Yarn) 原理,结构较原来都有所改变,所以安装、配置也都发生了改变。

原结构中:集群节点主要为
master namenode tasktracker
slave1 datanode jobtracker
slave2
新结构中:
ResourceManager和MR JobHistory Server
NameNode
SecondaryNameNode
datanode1 NodeManager

datanode2 NodeManager

安装Hadoop结构
192.168.199.241 ResourceManager
192.168.199.231 NameNode
192.168.199.232 DataNode1

192.168.199.242 DataNode2

访问链接:http://192.168.199.231:50070/

ResourceManager

1、创建用户

useradd hadoop

2.安装jdk1.7

设置JAVA环境 变量

cd ~

vi .bashrc

export JAVA_HOME=/user/ java/jdk1.7.0_80

export JAVA_BIN=$JAVA_HOME/bin

export PATH=$PATH:$JAVA_BIN

export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar

source .bashrc

 

3.安装SSH,无密码登录

ssh-keygen  -t rsa

cat /root/.ssh/id_rsa.pub>>/root/.ssh/authorized_keys

ssh localhost //测试是否可正常登录

注意这里  如果希望无密码登陆可以把DataNode的机器授权给Manager和NameNode

4、设置/etc/hosts和/etc/hostname

vi /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=ResourceManager

vi /etc/hosts

127.0.0.1   localhost

192.168.199.241 ResourceManager

192.168.199.231 NameNode

192.168.199.232 DataNode1

192.168.199.242 DataNode2

Hadoop安装及配置

各节点操作

一、下载hadoop 2.6.0

wget  http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

解压放至/usr/local/hadoop :要给hadoop文件夹授权

useradd hadoop

cd /usr/local/hadoop/

chown -R hadoop.hadoop /usr/local/hadoop

添加环境变量

export JAVA_HOME=/usr/java/jdk1.7.0_80

export JAVA_BIN=$JAVA_HOME/bin

export PATH=$PATH:$JAVA_BIN

export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

 

二、配置hadoop

1、修改环境变量
修改conf/hadoop-env.shand conf/yarn-env.sh
至少要指定 JAVA_HMOE
export JAVA_HOME=/usr/java/jdk1.7.0_80

更改lib目录
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/”

如果后期找不到路径等,还需要指定 HADOOP_PID_DIR和 HADOOP_SECURE_DN_PID_DIR

可能需要更改项:
日志存储目录:HADOOP_LOG_DIR / YARN_LOG_DIR

修改最大HEAPSIZE(MB),默认为1000M:HADOOP_HEAPSIZE / YARN_HEAPSIZE

2、配置hadoop

core-site.xml

<configuration>

        <property>

               <name>fs.defaultFS</name>

               <value>hdfs://NameNode:9000</value>   //NameNode为集群名称

        </property>

        <property>

               <name>io.file.buffer.size</name>

               <value>131072</value>

        </property>

</configuration>

hdfs-site.xml

<configuration>

        <property>

               <name>dfs.namenode.name.dir</name>

               <value>file:///usr/local/hadoop_data/dfs/name</value>           //NameNode节点目录 ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置

        </property>

         <property>

               <name>dfs.datanode.data.dir</name>

               <value>file:///usr/local/hadoop_data/dfs/data</value>            //DataNode节点目录  ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置

        </property>

</configuration>

vi yarn-site.xml

<configuration>

<!– Site specific YARN configuration properties –>

        <property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

        </property>

</configuration>

vi mapred-site.xml

<configuration>

        <property>

               <name>mapreduce.framework.name</name>

               <value>yarn</value>                                     //mapreduce的frmawork指定为yarn

        </property>

</configuration>

默认配置文件目录为$HADOOP_HOME/etc/hadoop下,要修改配置文件目录
可更改hadoop-env.sh

vi hadoop-env.sh

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-“/conf”}

修改slave信息

vi slaves

DataNode1

DataNode2

检测hadoop是否安装配置好:hadoop version

 

启动Hadoop

1、 NameNode 节点格式化hdfs

hdfs namenode -format NameNode  //NameNode为集群名称

注:如果非第一次格式化HDFS文件系统,

则需要在进行格式化操作前分别将NameNode和各个DataNode节点的dfs.namenode.name.dir目录下的所有内容清空。默认会提示是否重写

2、启动hdfs

登陆ResourceManger执行start-yarn.sh命令启动集群资源管理系统yarn

[root@ResourceManager sbin]# start-yarn.sh

starting yarn daemons

resourcemanager running as process 16431. Stop it first.

DataNode2: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode2.out

DataNode1: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode1.out

登陆NameNode执行start-dfs.sh命令启动集群HDFS文件系统

[root@NameNode hadoop]# start-dfs.sh

15/02/06 15:19:17 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable

Starting namenodes on [NameNode]

NameNode: starting namenode, logging to/usr/local/hadoop/logs/hadoop-root-namenode-NameNode.out

DataNode2: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode2.out

DataNode1: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode1.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t beestablished.

RSA key fingerprint is10:cd:13:8b:04:c0:51:c2:54:cc:3d:3e:17:5d:0c:17.

Are you sure you want to continue connecting (yes/no)?yes

0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (RSA) tothe list of known hosts.

0.0.0.0: starting secondarynamenode, logging to/usr/local/hadoop/logs/hadoop-root-secondarynamenode-NameNode.out

15/02/06 15:19:39 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable

Hadoop基本操作命令:

hadoop fs -ls /    //查看HDFS根目录下文件

hadoop fs -ls /testdir  //查看HDFS 目录/testdir下文件

hadoop fs -cat /testdir/test1.txt   //查看test1.txt内容

hadoop fs -put test2.txt /testdir   //将本地目录下文件test2.txt上传至/testdir目录中

 

mapred job -list all  //列出所有job进程,2.6以前版本中查看方法为hadoop job -list all