SPARK安装二:HADOOP集群部署

时间:2022-08-26 13:07:44

一、hadoop下载

使用2.7.6版本,因为公司生产环境是这个版本

cd /opt
wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz

二、配置文件

参考文档:https://hadoop.apache.org/docs/r2.7.6

在$HADOOP_HOME/etc/hadoop目录下需要配置7个文件

1.core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://pangu10:9000</value>
<description>NameNode URI,hdfs处理对外端口</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hdfs/tmp</value>
<description>hdfs重新格式化时(如新增了一个datenode)需要删除这个临时目录</description>
</property>
</configuration>

 2.hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hdfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>设置dfs副本数,不设置默认是3个</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>pangu11:50090</value>
<description>设置secondname的端口</description>
</property>
</configuration>

3.yarn-site.xml

<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>pangu10</value>
<description>指定resourcemanager所在的hostname</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>NodeManager上运行的附属服务,需配置成mapreduce_shuffle,才可运行MapReduce程序</description>
</property> <property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property> <property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>

4.mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定mapreduce使用yarn框架</description>
</property>
</configuration>

5.slaves

pangu10
pangu11
pangu12

 6.yarn-env.sh

找到第23行

# export JAVA_HOME=/home/y/libexec/jdk1.6.0/

替换成

export JAVA_HOME=/opt/jdk1..0_181/

7.hadoop-env.sh

找到25行

export JAVA_HOME=${JAVA_HOME}

替换成

export JAVA_HOME=/opt/jdk1..0_181/

三、复制到slave

四、hdfs格式化

shell执行如下命令

hdfs namenode -format

如果出现下面红色的日志内容则格式化成功

// :: INFO util.GSet: capacity = ^ =  entries
// :: INFO namenode.FSImage: Allocated new BlockPoolId: BP--192.168.56.10-
18/10/12 12:38:33 INFO common.Storage: Storage directory /opt/hdfs/name has been successfully formatted.
// :: INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
// :: INFO namenode.FSImageFormatProtobuf: Image file /opt/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size bytes saved in seconds.
// :: INFO namenode.NNStorageRetentionManager: Going to retain images with txid >=
// :: INFO util.ExitUtil: Exiting with status
// :: INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at pangu10/192.168.56.10
************************************************************/

五、启动hadoop

cd $HADOOP_HOME/sbin

./start-all.sh

说明:不要使用sh start-all.sh模式

http://pangu10:8088/cluster