参考: http://www.cnblogs.com/bovenson/p/5760856.html
准备3台虚拟机,IP地址和主机名分别配置为:
- 192.168.241.100 mini1
- 192.168.241.101 mini2
- 192.168.241.102 mini3
1. 安装JDK
下载jdk,解压到 /opt目录下,并配置环境变量
JDK下载地址为: http://www.oracle.com/technetwork/java/javase/downloads/index.html
tar jdk-8u161-linux-x64.tar.gz mv jdk1.8.0_161 /opt vim /etc/profile #在/etc/profile文件末尾添加如下内容 export JAVA_HOME=/opt/jdk1.8.0_161 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$JAVA_HOME/bin:$PATH # 使配置生效 source /etc/profile
2. 安装hadoop
下载hadoop并解压,配置相关的环境变量
hadoop下载的地址为: http://hadoop.apache.org/#Download+Hadoop
tar hadoop-2.9.0.tar.gz vim /etc/profile # 在/etc/profile文件末尾添加如下内容: export HADOOP_HOME=/root/hadoop-2.9.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH # 使配置生效 source /etc/profile
3.配置Hadoop集群模式
Hadoop相关的配置文件放在 hadoop-2.9.0/etc/hadoop目录下,分别需要编辑如下几个配置文件:
- core-site.xml hadoop的核心配置
- hdfs-site.xml hdfs相关配置
- mapred-site.xml mapreduce相关配置
- yarn-site.xml yarn相关配置
3.1 在core-site.xml文件中指定文件系统为hdfs
其中 /home/hadoop/hdfsdata 为hdfs数据存在的位置,可以自行指定
vim core-site.xml # 添加如下的配置 <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mini1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hdfsdata</value> </property> </configuration>
3.2 在hdfs-site.xml文件中指定文件保存的副本数量
vim hdfs-site.xml # 指定保存的文件副本数(默认为3,也可修改为其他值) <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
3.3 将 mapred-site.xml.template 复制为 mapred-site.xml,然后添加如下的mapreduce配置,指定mapreduce资源管理为yarn
cp mapred-site.xml.template mapred-site.xml vim mapred-site.xml # 指定map reduce 管理 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
3.4 在yarn-site.xml指定yarn相关设置
vim yarn-site.xml # 指定yarn主节点位置和mapreduce附加服务 <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>mini1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
4. 启动hadoop集群
4.1首先使用hdfs格式化文件系统
hadoop namenode -format
成功格式化之后hdfs数据存放位置的内容为:
[root@mini1 hdfsdata]# tree . . └── dfs └── name └── current ├── fsimage_0000000000000000000 ├── fsimage_0000000000000000000.md5 ├── seen_txid └── VERSION
4.2 启动namenode服务
hadoop-daemon.sh start namenode
启动成功后可以通过50070端口访问hdfs文件系统
4.3 依次启动datanode服务
启动namenode之后可以依次启动datanode服务,datanode会自动连接到指定的namenode节点服务
hadoop-daemon.sh start datanode
启动datanode之后可以在管理界面看到datanode的信息
5 集群自动化启动脚本设置
5.1 hdfs集群自动化启动脚本设置
在hadoop-2.9.0/etc/hadoop/slaves文件可以指定启动集群的机器, vim etc/hadoop/slaves
mini1 mini2 mini3
注: 需要配置mini1到各机器的ssh免密登陆
此时可以通过start-dfs.sh 和 stop-dfs.sh 命令启动和停止hdfs集群
[hadoop@mini1 hadoop]$ start-dfs.sh Starting namenodes on [mini1] mini1: starting namenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-namenode-mini1.out mini3: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini3.out mini1: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini1.out mini2: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini2.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-secondarynamenode-mini1.out [hadoop@mini1 hadoop]$ stop-dfs.sh Stopping namenodes on [mini1] mini1: stopping namenode mini1: stopping datanode mini2: stopping datanode mini3: stopping datanode Stopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode [hadoop@mini1 hadoop]$ jps 4449 DataNode 4737 Jps 4310 NameNode 4620 SecondaryNameNode
5.2 yarn集群启动设置
yarn集群可以通过start-yarn.sh 和 stop-yarn.sh 启动和停止
[hadoop@mini1 hadoop]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-resourcemanager-mini1.out mini2: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini2.out mini3: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini3.out mini1: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini1.out [hadoop@mini1 hadoop]$ jps 4449 DataNode 4931 NodeManager 4310 NameNode 4811 ResourceManager 4620 SecondaryNameNode 5213 Jps [hadoop@mini1 hadoop]$ stop-yarn.sh stopping yarn daemons stopping resourcemanager mini1: stopping nodemanager mini3: stopping nodemanager mini2: stopping nodemanager mini1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9 mini3: nodemanager did not stop gracefully after 5 seconds: killing with kill -9 mini2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9 no proxyserver to stop [hadoop@mini1 hadoop]$
此时登陆其它机器可以看到 datanode 和 NodeManager 已经启动
[hadoop@mini2 ~]$ jps 2963 Jps 2501 DataNode 2828 NodeManager [hadoop@mini3 ~]$ jps 2481 DataNode 2804 NodeManager 2939 Jps