centos7.6安装配置hadoop笔记(单机安装、伪分布式安装)

时间:2024-03-21 07:19:44

实验环境:

1.centos7.6 最小化安装
2.jdk1.8.0_201
3.yum install rsync -y
4.ssh:最小化安装的系统中已有ssh,不用安装

1 首先安装rsync `

[[email protected] ~]# yum install rsync -y

2 关闭防火墙

[[email protected] /]# systemctl stop firewalld.service
下面的命令是禁止防火墙开机启动
[[email protected] /]# systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

3 在根目录下新建文件夹software

[[email protected] /]# mkdir software

4 上传jdk和hadoop安装包

    通过winSCP将jdk(jdk-8u201-linux-x64.tar.gz)和hadoop(hadoop-3.2.0.tar.gz)拷贝到/software目录下
[[email protected] software]# ll -h
total 513M
-rw-r–r--. 1 root root 330M Jan 24 17:54 hadoop-3.2.0.tar.gz
-rw-r–r--. 1 root root 183M Jan 19 15:22 jdk-8u201-linux-x64.tar.gz

5 安装jdk

5.1 将jdk解压

[[email protected] software]# tar -zxvf jdk-8u201-linux-x64.tar.gz

[[email protected] software]# ls
hadoop-3.2.0.tar.gz jdk1.8.0_201 jdk-8u201-linux-x64.tar.gz

5.2 配置环境变量,修改:/etc/profile文件,在末尾加上下面两句话

export JAVA_HOME=/software/jdk1.8.0_201
export PATH=$JAVA_HOME/bin:$PATH

5.3 使/etc/profile生效

[[email protected] software]# source /etc/profile

5.4 测试java环境

[[email protected] software]# java -version
java version “1.8.0_201”
Java™ SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot™ 64-Bit Server VM (build 25.201-b09, mixed mode)
表示java安装成功

6 配置hadoop

6.1 解压hadoop安装包

[[email protected] software]# tar -zxvf hadoop-3.2.0.tar.gz
[[email protected] software]# ls
hadoop-3.2.0 hadoop-3.2.0.tar.gz jdk1.8.0_201 jdk-8u201-linux-x64.tar.gz

6.2 在hadoop的配置文件etc/hadoop/hadoop-env.sh中配置java环境,文件末尾追加

export JAVA_HOME=/software/jdk1.8.0_201

6.3 测试hadoop环境

[[email protected] software]# cd hadoop-3.2.0
[[email protected] hadoop-3.2.0]# bin/hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /software/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
表示hadoop可以运行了

7 hadoop的运行方式一(单机运行)

7.1 创建目录input

[[email protected] hadoop-3.2.0]# mkdir input

7.2 将hadoop的所有配置文件拷贝到input目录中,作为测试的数据

[[email protected] hadoop-3.2.0]# cp etc/hadoop/*.xml input
[[email protected] hadoop-3.2.0]# ls input
capacity-scheduler.xml hadoop-policy.xml httpfs-site.xml kms-site.xml yarn-site.xml
core-site.xml hdfs-site.xml kms-acls.xml mapred-site.xml

7.3 运行hadoop,搜索input目录下的所有文件,按照规则’dfs[a-z.]+'匹配,结果输出到output目录下

[[email protected] hadoop-3.2.0]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep input output 'dfs[a-z.]+'

7.4 查看结果输出

[[email protected] hadoop-3.2.0]# ll output
total 4
-rw-r–r--. 1 root root 11 Jan 25 14:55 part-r-00000
-rw-r–r--. 1 root root 0 Jan 25 14:55 _SUCCESS

[[email protected] hadoop-3.2.0]# cat output/part-r-00000
1 dfsadmin

7.5 删除input和output文件夹

[[email protected] hadoop-3.2.0]# rm -vfr input output

8 hadoop的运行方式二(伪分布式)

8.1修改配置文件etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

8.2 修改配置文件etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

8.3 检查ssh是否能免密码登录

[[email protected] hadoop-3.2.0]# ssh localhost
The authenticity of host ‘localhost (::1)’ can’t be established.
ECDSA key fingerprint is SHA256:MJxZUIDNbbnlfxCU+l2usvsIsbc6/NTJ06j/TO4g8G0.
ECDSA key fingerprint is MD5:d1:8f:94:dd:80:e2:cf:6b:a7:45:74:e3:6b:2f:f2:0a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (ECDSA) to the list of known hosts.
[email protected]’s password:
Last login: Fri Jan 25 14:30:29 2019 from 192.168.114.1

解释: 看到如上内容说明还不能免密码登录

8.4 配置ssh免密码登录

[[email protected] ~]# ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
[[email protected] ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[[email protected] ~]# chmod 0600 ~/.ssh/authorized_keys

8.5 再次检查是否能免密码登录

[[email protected] hadoop-3.2.0]# ssh localhost
Last login: Fri Jan 25 15:04:51 2019 from localhost
解释: 说明可以免密码登录了

8.6 格式化文件系统

[[email protected] hadoop-3.2.0]# bin/hdfs namenode -format
2019-01-25 15:09:23,273 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.0

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

8.7 修改sbin/start-dfs.sh和sbin/stop-dfs.sh,在文件头加入以下内容

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

8.8 启动hadoop服务

[[email protected] hadoop-3.2.0]# sbin/start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
Last login: Fri Jan 25 15:07:02 CST 2019 from localhost on pts/1
Starting datanodes
Last login: Fri Jan 25 15:14:50 CST 2019 on pts/0
Starting secondary namenodes [localhost.localdomain]
Last login: Fri Jan 25 15:14:53 CST 2019 on pts/0
localhost.localdomain: Warning: Permanently added ‘localhost.localdomain’ (ECDSA) to the list of known hosts.

8.9 浏览器访问namenode节点

http://192.168.114.134:9870
可以看到hadoop的信息
centos7.6安装配置hadoop笔记(单机安装、伪分布式安装)

8.10 测试调用hadoop搜索功能

8.10.1 首先创建用户目录

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -mkdir /user
[[email protected] hadoop-3.2.0]# bin/hdfs dfs -mkdir /user/root

8.10.2 查看当前用户下的文件

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls
什么也没有

8.10.3 准备实验的数据(将etc/hadoop下面的所有xml文件拷贝到input目录下)

[[email protected] hadoop-3.2.0]# mkdir input
[[email protected] hadoop-3.2.0]# cp etc/hadoop/*.xml input
[[email protected] hadoop-3.2.0]# ls input
capacity-scheduler.xml hadoop-policy.xml httpfs-site.xml kms-site.xml yarn-site.xml
core-site.xml hdfs-site.xml kms-acls.xml mapred-site.xml

8.10.4 将input目录上传到hadoop上命名为input1

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -put input input1

8.10.5 查看hadoop上已有的实验文件

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2019-01-25 15:24 input1

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls input1
Found 9 items
-rw-r–r-- 1 root supergroup 8260 2019-01-25 15:24 input1/capacity-scheduler.xml
-rw-r–r-- 1 root supergroup 884 2019-01-25 15:24 input1/core-site.xml
-rw-r–r-- 1 root supergroup 11392 2019-01-25 15:24 input1/hadoop-policy.xml
-rw-r–r-- 1 root supergroup 868 2019-01-25 15:24 input1/hdfs-site.xml
-rw-r–r-- 1 root supergroup 620 2019-01-25 15:24 input1/httpfs-site.xml
-rw-r–r-- 1 root supergroup 3518 2019-01-25 15:24 input1/kms-acls.xml
-rw-r–r-- 1 root supergroup 682 2019-01-25 15:24 input1/kms-site.xml
-rw-r–r-- 1 root supergroup 758 2019-01-25 15:24 input1/mapred-site.xml
-rw-r–r-- 1 root supergroup 690 2019-01-25 15:24 input1/yarn-site.xml

8.10.6 调用hadoop,搜索input1目录下的所有文件,按照规则’dfs[a-z.]+'匹配,结果输出到output目录下

[[email protected] hadoop-3.2.0]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar grep input1 output 'dfs[a-z.]+'

8.10.7 查看hadoop上生成的output目录

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2019-01-25 15:24 input1
drwxr-xr-x - root supergroup 0 2019-01-25 15:26 output

[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls output
Found 2 items
-rw-r–r-- 1 root supergroup 0 2019-01-25 15:26 output/_SUCCESS
-rw-r–r-- 1 root supergroup 29 2019-01-25 15:26 output/part-r-00000
[[email protected] hadoop-3.2.0]# bin/hdfs dfs -cat output/part-r-00000
1 dfsadmin
1 dfs.replication
这里可以看到搜索的结果.

8.10.8 实验完毕,关闭伪分布式hadoop

[[email protected] hadoop-3.2.0]# sbin/stop-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [localhost]
Last login: Fri Jan 25 15:14:57 CST 2019 on pts/0
Stopping datanodes
Last login: Fri Jan 25 15:30:53 CST 2019 on pts/0
Stopping secondary namenodes [localhost.localdomain]
Last login: Fri Jan 25 15:30:54 CST 2019 on pts/0