在Ubuntu 64位OS上运行hadoop2.2.0[重新编译hadoop]

时间:2022-03-05 09:01:47

最近在学习搭建Hadoop, 我们从Apache官方网站直接下载最新版本Hadoop2.2。官方目前是提供了linux32位系统可执行文件,结果运行时发现提示 “libhadoop.so.1.0.0 which might have disabled stack guard” 的警告。 Google了一下发现是因为 hadoop 2.2.0提供的是libhadoop.so库是32位的,而我们的机器是64位。 解决的办法就是重新在64位的机器上编译hadoop。

编译环境

OS: Ubuntu 12.04 64-bit

hadoop version: 2.2.0

Java: Jdk1.7.0_45

***********************************************

First. 在Ubuntu 64位OS上运行hadoop2.2.0[重新编译hadoop]

java环境配置

参考这篇文章:Ubuntu下安装jdk

************************************************

Second. 搭建小集群, 我用vmware workstation创建虚拟机,先创建一个,然后复制,稍微修改下

  cloud001:   NameNode  ResouceManager  (master)

  cloud002:   DataNode   NodeManager

  cloud003:   DataNode   NodeManager

  cloud004:   DataNode   NodeManager

假定虚拟机的IP地址如下,后面会用到。

  cloud001:  192.168.60.128  (master)

  cloud002:  192.168.60.130  (slaver)

  cloud003:  192.168.60.131  (slaver)

  cloud004:  192.168.60.132  (slaver)

三台机器上创建相同的用户(这是Hadoop的基本要求)

在三台主机上分别设置:/etc/hosts 和/etc/hostname

hosts这个文件用于定义主机名和IP地址之间的映射关系。

  127.0.0.1            localhost
  192.168.60.128      cloud001

  192.168.60.130      cloud002

  192.168.60.131      cloud003

  192.168.60.132      cloud004

/etc/hostname这个文件用于定义Ubuntu的主机名:如:cloud001(master)(或者slave1等)

************************************************

Third.下面安装ssh

3.1一般系统是默认安装了ssh命令的。如果没有,或者版本比较老,则可以重新安装:

sodu apt-get install ssh

3.2设置local无密码登陆

安装完成后会在~目录(当前用户主目录,即这里的/home/loull)下产生一个隐藏文件夹.ssh(ls  -a 可以查看隐藏文件)。如果没有这个文件,自己新建即可(mkdir .ssh)。

注意:以上操作在每台机器上面都要进行。

接着在master(cloud001)上生成密钥并配置SSH无密码登录

具体步骤如下:

  1、 进入.ssh文件夹

  2、 ssh-keygen -t  rsa 之后一路回车(产生秘钥)

  3、 把id_rsa.pub 追加到授权的 key 里面去(cat id_rsa.pub >> authorized_keys)

  4、 重启 SSH 服务命令使其生效

3.3 将生成的authorized_keys文件拷贝到两台slave主机相同的文件夹下,命令如下:
  scp authorized_keys cloud002:~/.ssh/
  scp authorized_keys cloud003:~/.ssh/
  scp authorized_keys cloud004:~/.ssh/

3.4 此时已经可以进行ssh的无密码登陆,查看是否可以从master主机无密码登录slave,输入命令:

  $:ssh cloud002

  $:ssh cloud003

  $:ssh cloud003

*******************************************************

Forth.以上正确完成之后便可进入Hadoop的安装

以下操作以loull登录进行操作。

由于hadoop集群中每个机器上面的配置基本相同,所以我们先在namenode上面进行配置部署,然后再复制到其他节点。所以这里的安装过程相当于在每台机器上面都要执行。但需要注意的是集群中64位系统和32位系统的问题。

4.1、 下载并解压

hadoop-2.2.0.tar.gz

文件

将在64位机器上编译好的hadoop-2.2.0拷贝

到/home/loull/hadoop路径下。

4.2、HDFS安装配置
1)

配置/home/loull/hadoop/etc/hadoop/hadoop-env.sh

替换exportJAVA_HOME=${JAVA_HOME}为如下:

export JAVA_HOME=/opt/Java/jdk/jdk1.7  (以自己的jdk为准)

同样,配置

yarn-env.sh,在里面加入:

export JAVA_HOME=/opt/Java/jdk/jdk1.7  (以自己的jdk为准)

2)配置etc/hadoop/core-site.xml文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cloud001:9000</value>
</property> <property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property> <property>
<name>hadoop.tmp.dir</name>
<value>file:/home/loull/tmp</value>
<description>Abase foer other temporary directories.</description>
</property>
</configuration>
3)配置etc/hadoop/hdfs-site.xml文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>cloud001:9001</value>
</property> <property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/loull/dfs/name</value>
</property> <property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/loull/dfs/data</value>
</property> <property>
<name>dfs.replication</name>
<value>3</value>
</property> <property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

4.3、YARN安装配置

配置etc/hadoop/yarn-site.xml文件内容:

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration> <!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property> <property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property> <property>
<name>yarn.resourcemanager.address</name>
<value>cloud001:8032</value>
</property> <property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cloud001:8030</value>
</property> <property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cloud001:8031</value>
</property> <property>
<name>yarn.resourcemanager.admin.address</name>
<value>cloud001:8033</value>
</property> <property>
<name>yarn.resourcemanager.webapp.address</name>
<value>cloud001:8088</value>
</property>
</configuration>

注意此配置的最后一个property的value值,只能是A-Za-z0-9_,不能以数字开头,否则可能造成nodemanager无法正常启动。

4.4.

配置文件slaves (这个文件里面保存所有slave节点)

写入以下内容:

cloud002

cloud003

cloud004

4.5 配置完成后,复制到其他节点(slave1,slave2),如下图

scp -r hadoop-2.2.0 cloud002:/home/loull/hadoop

在Ubuntu 64位OS上运行hadoop2.2.0[重新编译hadoop]

注意:只需要原样复制即可,不必改动上面的xml配置文件。切记..

*******************************************************

Fifth.启动集群
1)启动HDFS集群
首先,需要格式化HDFS,执行如下命令:
  loull@cloud001:~/hadoop/hadoop-2.2.0$ bin/hdfs namenode -format
如果格式化正常,日志中不会出现异常信息,可以继续启动集群相关服务
启动HDFS集群,执行如下命令:
  loull@cloud001:~/hadoop/hadoop-2.2.0$ sbin/start-dfs.sh

可以在master结点上看到如下几个进程:

  loull@cloud001:~/hadoop/hadoop-2.2.0$ jps
  6638 Jps
  6015 NameNode
  6525 SecondaryNameNode
在cloud002(slave)结点上看到如下进程:
  loull@cloud002:~/hadoop/hadoop-2.2.0/etc/hadoop$ jps
  4264 Jps
  4208 DataNode
2)启动YARN集群
如果配置完成以后,启动YARN集群非常容易,只需要执行几个脚本就可以。
启动YARN集群,执行如下命令(注意只在master上执行如下命令):
  loull@cloud002:~/hadoop/hadoop-2.2.0$sbin/start-yarn.sh
3)验证集群
最后,验证集群计算,执行Hadoop自带的examples,执行如下命令:

  $: bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter out

hdfs: http://cloud001:50070/dfshealth.jsp
yarn: http://192.168.60.128:8088/cluster
Reference: