安装
基于CentOS 7 安装,系统非最小化安装,选择部分Server 服务,开发工具组。全程使用root用户,因为操作系统的权限、安全,在启动时会和使用其它用户有差别。
Step 1:下载hadoop.apache.org
选择推荐的下载镜像结点;
https://hadoop.apache.org/releases.html
Step 2:下载JDK
http://www.oracle.com/technetwork/pt/java/javase/downloads/jdk8-downloads-2133151.html
Java 9 is not compatible with Hadoop 3 (and possibly any hadoop version) yet.
Step 4: 解压下载好的文件
解压:JDK文件
# Tar –zxvf /root/Download/jdk-8u192-linux-x64.tar.gz -C /opt
解压:Hadoop文件
# Tar –zxvf /root/Download/ hadoop-3.1.1.tar.gz –C /opt
Step 5 安装JSVC
# rpm –ivh apache-commons-daemon-jsvc-1.0.13-7.el7.x86_64.rpm
Step 6:修改主机名
# vi /etc/hosts
添加所有涉及的服务器别名
192.168.154.116 master
192.168.154.117 slave1
192.168.154.118 slave2
添加主机的名称
# vi /etc/hostname
master
##这个文件里第一行必须是主机名,第二行开始的内容对OS和Hadoop来说是没意义的
Step 7: ssh互信(免密码登录)
注意我这里配置的是root用户,所以以下的家目录是/root
如果你配置的是用户是xxxx,那么家目录应该是/home/xxxxx/
复制代码
#在主节点执行下面的命令:
# ssh-keygen -t rsa -P '' #一路回车直到生成公钥
scp /root/.ssh/id_rsa.pub root@slave1:/root/.ssh/id_rsa.pub.master #从master节点拷贝id_rsa.pub到worker主机上,并且改名为id_rsa.pub.master
scp /root/.ssh/id_rsa.pub root@slave2:/root/.ssh/id_rsa.pub.master #同上,以后使用workerN代表worker1和worker2.
scp /etc/hosts root@workerN:/etc/hosts #统一hosts文件,让几个主机能通过host名字来识别彼此
#在对应的主机下执行如下命令:
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys #master主机
cat /root/.ssh/id_rsa.pub.master >> /root/.ssh/authorized_keys #workerN主机
复制代码
这样master主机就可以无密码登录到其他主机,这样子在运行master上的启动脚本时和使用scp命令时候,就可以不用输入密码了。
Step 8: 添加环境变量
命令#: vi .bash_profile
PATH=/usr/local/webserver/mysql/bin:/usr/python/bin:/opt/hadoop-3.1.1/etc/hadoop:/opt/jdk-10.0.2/bin:/opt/hadoop-3.1.1/bin:/opt/hadoop-3.1.1/sbin:$PATH:$HOME/bin:/opt/spark/bin:/opt/spark/sbin:/opt/hive/bin:/opt/flume/bin:/opt/kafka/bin
export PATH
JAVA_HOME=/opt/jdk-10.0.2
export JAVA_HOME
export HADOOP_HOME=/opt/hadoop-3.1.1
export LD_LIBRARY_PATH=/usr/local/lib:/usr/python/lib:/usr/local/webserver/mysql/lib
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin
export YARN_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export FLUME_HOM=/opt/flume
Step9: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/hadoop-env.sh
添加
JAVA_HOME=/opt/jdk-10.0.2
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export JSVC_HOME=/usr/bin
Step 10: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/core-site.xml
<configuration>
<!-- 指定HDFS(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.154.116:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.1.1/tmp</value>
</property>
</configuration>
Step 11: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<configuration>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 设置namenode的http通讯地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<!-- 设置secondarynamenode的http通讯地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.1.1/name</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.1.1/data</value>
</property>
</configuration>
Step 12:修改vi /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml
如果此文件不存在可是复制 template文件 cp /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml.template /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
</configuration>
Step 13:修改 vi /opt/hadoop-3.1.1/etc/hadoop/yarn-site.xml
<configuration>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 开启日志聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志聚合目录 -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/hadoop-3.1.1/logs</value>
</property>
<property>
<!-- 指定ResourceManager 所在的节点 -->
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
Step 14:修改 vi /opt/hadoop-3.1.1/etc/hadoop/masters
192.168.154.116
Step 15:修改 vi /opt/hadoop-3.1.1/etc/hadoopslaves
192.168.154.116
Step 16:修改 vi /opt/hadoop-3.1.1/etc/hadoopworkers
192.168.154.116
##使用一台VM按cluster的方式搭建,属于分布式。当使用多台机器时,同样的配置方式,并将多台机器互信,则为真正的分布式。
Step 17:修改 vi /opt/hadoop-3.1.1/etc/hadoop/yarn-env.sh
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
Step 18:重启Linux系统
reboot 或者 init 6
Step 19:格式化hadoop
# cd /opt/hadoop-3.1.1/etc/hadoop/
# hdfs namenode -format
格式化一次就好,多次格式化可能导致datanode无法识别,如果想要多次格式化,需要先删除数据再格式化
Step 20:启动hdfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
Step 21:检查是否安装成功
#jps
12006 NodeManager
11017 NameNode
11658 ResourceManager
13068 Jps
11197 DataNode
11389 SecondaryNameNode
Step 22:上传文件测试
# cd ~
# vi helloworld.txt
# hdfs dfs -put helloworld.txt helloworld.txt