先申明本人的安装环境:CentOS6.7,Hadoop2.6,jdk 1.7
Hadoop2相比较于Hadoop1.x来说,HDFS的架构与MapReduce的都有较大的变化,且速度上和可用性上都有了很大的提高,
Hadoop新版2.6.0采用的了新的 map-reduce 框架(Yarn) 原理,结构较原来都有所改变,所以安装、配置也都发生了改变。
原结构中:集群节点主要为
master namenode tasktracker
slave1 datanode jobtracker
slave2
新结构中:
ResourceManager和MR JobHistory Server
NameNode
SecondaryNameNode
datanode1 NodeManager
datanode2 NodeManager
安装Hadoop结构
192.168.199.241 ResourceManager
192.168.199.231 NameNode
192.168.199.232 DataNode1
192.168.199.242 DataNode2
访问链接:http://192.168.199.231:50070/
ResourceManager:
1、创建用户
useradd hadoop
2.安装jdk1.7
设置JAVA环境 变量
cd ~
vi .bashrc
export JAVA_HOME=/user/ java/jdk1.7.0_80
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
source .bashrc
3.安装SSH,无密码登录
ssh-keygen -t rsa
cat /root/.ssh/id_rsa.pub>>/root/.ssh/authorized_keys
ssh localhost //测试是否可正常登录
注意这里 如果希望无密码登陆可以把DataNode的机器授权给Manager和NameNode
4、设置/etc/hosts和/etc/hostname
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ResourceManager
vi /etc/hosts
127.0.0.1 localhost
192.168.199.241 ResourceManager
192.168.199.231 NameNode
192.168.199.232 DataNode1
192.168.199.242 DataNode2
Hadoop安装及配置
各节点操作
一、下载hadoop 2.6.0
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
解压放至/usr/local/hadoop :要给hadoop文件夹授权
useradd hadoop
cd /usr/local/hadoop/
chown -R hadoop.hadoop /usr/local/hadoop
添加环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_80
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
二、配置hadoop
1、修改环境变量
修改conf/hadoop-env.shand conf/yarn-env.sh
至少要指定 JAVA_HMOE
export JAVA_HOME=/usr/java/jdk1.7.0_80
更改lib目录
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/”
如果后期找不到路径等,还需要指定 HADOOP_PID_DIR和 HADOOP_SECURE_DN_PID_DIR
可能需要更改项:
日志存储目录:HADOOP_LOG_DIR / YARN_LOG_DIR
修改最大HEAPSIZE(MB),默认为1000M:HADOOP_HEAPSIZE / YARN_HEAPSIZE
2、配置hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000</value> //NameNode为集群名称
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_data/dfs/name</value> //NameNode节点目录 ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_data/dfs/data</value> //DataNode节点目录 ,需要提前创建,并可写权限,如果目录不存在,会忽略此配置
</property>
</configuration>
vi yarn-site.xml
<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> //mapreduce的frmawork指定为yarn
</property>
</configuration>
默认配置文件目录为$HADOOP_HOME/etc/hadoop下,要修改配置文件目录
可更改hadoop-env.sh
vi hadoop-env.sh
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-“/conf”}
修改slave信息
vi slaves
DataNode1
DataNode2
检测hadoop是否安装配置好:hadoop version
启动Hadoop
1、 NameNode 节点格式化hdfs
hdfs namenode -format NameNode //NameNode为集群名称
注:如果非第一次格式化HDFS文件系统,
则需要在进行格式化操作前分别将NameNode和各个DataNode节点的dfs.namenode.name.dir目录下的所有内容清空。默认会提示是否重写
2、启动hdfs
登陆ResourceManger执行start-yarn.sh命令启动集群资源管理系统yarn
[root@ResourceManager sbin]# start-yarn.sh
starting yarn daemons
resourcemanager running as process 16431. Stop it first.
DataNode2: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode2.out
DataNode1: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode1.out
登陆NameNode执行start-dfs.sh命令启动集群HDFS文件系统
[root@NameNode hadoop]# start-dfs.sh
15/02/06 15:19:17 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable
Starting namenodes on [NameNode]
NameNode: starting namenode, logging to/usr/local/hadoop/logs/hadoop-root-namenode-NameNode.out
DataNode2: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode2.out
DataNode1: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode1.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t beestablished.
RSA key fingerprint is10:cd:13:8b:04:c0:51:c2:54:cc:3d:3e:17:5d:0c:17.
Are you sure you want to continue connecting (yes/no)?yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (RSA) tothe list of known hosts.
0.0.0.0: starting secondarynamenode, logging to/usr/local/hadoop/logs/hadoop-root-secondarynamenode-NameNode.out
15/02/06 15:19:39 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable
Hadoop基本操作命令:
hadoop fs -ls / //查看HDFS根目录下文件
hadoop fs -ls /testdir //查看HDFS 目录/testdir下文件
hadoop fs -cat /testdir/test1.txt //查看test1.txt内容
hadoop fs -put test2.txt /testdir //将本地目录下文件test2.txt上传至/testdir目录中
mapred job -list all //列出所有job进程,2.6以前版本中查看方法为hadoop job -list all