Apache Hadoop -2.2.0 - How to Install a Three Nodes Cluster
http://tonylixu.blogspot.ca/2014/02/apache-hadoop-how-to-install-three.html
Centos 6.5 hadoop 2.2.0 全分布式安装
http://xjliao.me/2014/03/21/hadoop-2.2.0-cluster-setup.html
==============================
cluster: n0,n1,n2
n0:NameNode,ResourceManager ;
n1.n2:DataNode,NodeManager;
1. prerequiration
1.1 添加用户hm
#useradd hm
#passwd hm
1.2 jdk 1.6/1.7
Remove OpenJDK.
yum -y remove *jdk*
yum -y remove *java*
1.3 ssh 无密码登录
1.所有机器: 使用hm用户登录
$cd /home/hm
$mkdir .ssh 2. 在namenode上生成密钥对
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2.1 .ssh目录要设成700 有执行权限
2.2 authorized_keys要设成600 否则会出错
2.3 还有ssh 登陆要加入用户名的 比如(需要密码)
$ssh n1
$ssh n2 3. 复制公钥(需要密码)
$cd .ssh
$scp authorized_keys n1:/home/hm/.ssh
$scp authorized_keys n2:/home/hm/.ssh
4.测试 (!!不需要密码)
ssh n1
ssh n2
2. hadoop 通用配置
2.1 hadoop-env.sh
2.2 slave 工作节点
3. hadoop四大组件配置
3.1 组件core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://n0:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hm/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hm.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hm.groups</name>
<value>*</value>
</property>
</configuration>
3.2 组件 hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>n0:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hm/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hm/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
3.3 组件yarn-site.xml
<?xml version="1.0"?> <configuration> <!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>n0:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>n0:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>n0:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>n0:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>n0:8088</value>
</property>
</configuration>
3.4 组件mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>n0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>n0:19888</value>
</property>
</configuration>
4. 启动和停止
4.1 启动
sbin/start-dfs.sh
sbin/start-yarn.sh
4.2 停止
sbin/stop-dfs.sh
sbin/stop-yarn.sh
5.测试
运行wordcount单词计数案例:
$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
将目录加入hadoop:
$ bin/hadoop hdfs -copyFromLocal input /input
在HADOOP_HOME运行wordcount案例::
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /input /output
检查输出:
$ bin/hadoop dfs -cat /out/*
===================