hadoop 2.9.0 集群安装

时间:2022-01-16 06:28:45

参考: http://www.cnblogs.com/bovenson/p/5760856.html

准备3台虚拟机,IP地址和主机名分别配置为: 

  • 192.168.241.100 mini1
  • 192.168.241.101 mini2
  • 192.168.241.102 mini3

1. 安装JDK

下载jdk,解压到 /opt目录下,并配置环境变量

JDK下载地址为: http://www.oracle.com/technetwork/java/javase/downloads/index.html

tar jdk-8u161-linux-x64.tar.gz
mv jdk1.8.0_161 /opt
vim /etc/profile

#在/etc/profile文件末尾添加如下内容

export JAVA_HOME=/opt/jdk1.8.0_161
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH

# 使配置生效
source /etc/profile

2. 安装hadoop

下载hadoop并解压,配置相关的环境变量

hadoop下载的地址为: http://hadoop.apache.org/#Download+Hadoop

tar hadoop-2.9.0.tar.gz
vim /etc/profile

# 在/etc/profile文件末尾添加如下内容:
export HADOOP_HOME=/root/hadoop-2.9.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

# 使配置生效
source /etc/profile

3.配置Hadoop集群模式

Hadoop相关的配置文件放在 hadoop-2.9.0/etc/hadoop目录下,分别需要编辑如下几个配置文件:

  • core-site.xml         hadoop的核心配置
  • hdfs-site.xml          hdfs相关配置
  • mapred-site.xml    mapreduce相关配置
  • yarn-site.xml          yarn相关配置

3.1 在core-site.xml文件中指定文件系统为hdfs

其中 /home/hadoop/hdfsdata 为hdfs数据存在的位置,可以自行指定

vim core-site.xml

# 添加如下的配置
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mini1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/hdfsdata</value>
    </property>
</configuration>

3.2 在hdfs-site.xml文件中指定文件保存的副本数量

vim hdfs-site.xml

# 指定保存的文件副本数(默认为3,也可修改为其他值)
 <configuration>
     <property>
         <name>dfs.replication</name>
         <value>2</value>
     </property>
 </configuration>

3.3 将 mapred-site.xml.template 复制为 mapred-site.xml,然后添加如下的mapreduce配置,指定mapreduce资源管理为yarn

cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

# 指定map reduce 管理
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

3.4 在yarn-site.xml指定yarn相关设置

vim yarn-site.xml

# 指定yarn主节点位置和mapreduce附加服务
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>mini1</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

4. 启动hadoop集群

4.1首先使用hdfs格式化文件系统

hadoop namenode -format

成功格式化之后hdfs数据存放位置的内容为:

[root@mini1 hdfsdata]# tree .
.
└── dfs
    └── name
        └── current
            ├── fsimage_0000000000000000000
            ├── fsimage_0000000000000000000.md5
            ├── seen_txid
            └── VERSION

4.2 启动namenode服务

hadoop-daemon.sh start namenode

启动成功后可以通过50070端口访问hdfs文件系统

hadoop 2.9.0 集群安装

4.3 依次启动datanode服务

启动namenode之后可以依次启动datanode服务,datanode会自动连接到指定的namenode节点服务

hadoop-daemon.sh start datanode

启动datanode之后可以在管理界面看到datanode的信息

hadoop 2.9.0 集群安装

5 集群自动化启动脚本设置

5.1 hdfs集群自动化启动脚本设置

在hadoop-2.9.0/etc/hadoop/slaves文件可以指定启动集群的机器, vim etc/hadoop/slaves

mini1
mini2
mini3

注: 需要配置mini1到各机器的ssh免密登陆

此时可以通过start-dfs.sh 和 stop-dfs.sh 命令启动和停止hdfs集群

[hadoop@mini1 hadoop]$ start-dfs.sh 
Starting namenodes on [mini1]
mini1: starting namenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-namenode-mini1.out
mini3: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini3.out
mini1: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini1.out
mini2: starting datanode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-datanode-mini2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.0/logs/hadoop-hadoop-secondarynamenode-mini1.out

[hadoop@mini1 hadoop]$ stop-dfs.sh 
Stopping namenodes on [mini1]
mini1: stopping namenode
mini1: stopping datanode
mini2: stopping datanode
mini3: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode

[hadoop@mini1 hadoop]$ jps
4449 DataNode
4737 Jps
4310 NameNode
4620 SecondaryNameNode

5.2 yarn集群启动设置

yarn集群可以通过start-yarn.sh 和 stop-yarn.sh 启动和停止

[hadoop@mini1 hadoop]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-resourcemanager-mini1.out
mini2: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini2.out
mini3: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini3.out
mini1: starting nodemanager, logging to /home/hadoop/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-mini1.out

[hadoop@mini1 hadoop]$ jps
4449 DataNode
4931 NodeManager
4310 NameNode
4811 ResourceManager
4620 SecondaryNameNode
5213 Jps

[hadoop@mini1 hadoop]$ stop-yarn.sh 
stopping yarn daemons
stopping resourcemanager
mini1: stopping nodemanager
mini3: stopping nodemanager
mini2: stopping nodemanager
mini1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
mini3: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
mini2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

[hadoop@mini1 hadoop]$ 

此时登陆其它机器可以看到 datanode 和 NodeManager 已经启动

[hadoop@mini2 ~]$ jps
2963 Jps
2501 DataNode
2828 NodeManager

[hadoop@mini3 ~]$ jps
2481 DataNode
2804 NodeManager
2939 Jps