一、环境描述
Mysql版本:mysql-installer-community-5.5.27.1 32位
Mysql for Windows 7 32位:我把mysql数据库安装在了自己win7的笔记本上,这样的好处就是减少了虚拟机 master slave的开销和使用空间还可以多利用一台机器的资源,如果你的虚拟机资源很紧张的话也可以这样部署。
Linux ISO:CentOS-6.0-i386-bin-DVD.iso 32位
JDK version:"1.6.0_25-ea"
Hadoop software version:hadoop-0.20.205.0.tar.gz
Hbase version:hbase-0.90.5
Hive version:hive-0.8.1.tar.gz
二、Mysql数据库注意事项
1.安装mysql数据库步骤略过,虚拟机网络设置成host-only模式,两边系统都关闭防火墙,windows系统的360软件最好也关闭,有可能会影响通信。
数据库用户/密码:root/root 和 hive/hive
数据库端口:3306
Mysqll服务名:MySQL55
以将MySQL服务器安装成服务。安装成服务,系统启动时可以自动启动MySQL服务器
目前设置为不随操作系统启动而自动启动
启动/关闭mysql数据库服务命令,必须启动服务才能操作数据库
net start mysql net stop mysql
2.查看Mysql版本(注意 V大写)
3.登陆Mysql,从windows上登录
4.在mysql上建立hive用户并授予足够的权限
本人使用的是windows上mysql自带的MySQL workbench客户端软件创建的用户
Add Account :添加一个用户
Name:hive 密码 hive
Limit Connectivity to Hosts Matching:% 百分号是通配符意思,就是说不在限制登录的主机,你从任何一台客户机都可以登录,如果写上ip地址说明此用户只能从这个ip地址的机器上登录mysql数据库
全部勾选授予all权限
命令方式:Create user hive identified by hive;
Hadoop部署环境
Windows 192.168.2.110 Mysql window7部署mysql
master上测试是否能成功连接远程mysql
[grid@h1 grid]$ ping 192.168.2.110
PING 192.168.2.110 (192.168.2.110) 56(84) bytes of data.
64 bytes from 192.168.2.110: icmp_seq=1 ttl=64 time=0.603 ms 是通的
64 bytes from 192.168.2.110: icmp_seq=2 ttl=64 time=0.283 ms
64 bytes from 192.168.2.110: icmp_seq=3 ttl=64 time=0.301 ms
--- 192.168.2.110 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2010ms
rtt min/avg/max/mdev = 0.283/0.395/0.603/0.148 ms
[grid@h1 grid]$ mysql -h192.168.2.110 -uhive -phive
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.5.27 MySQL Community Server (GPL)
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
5.在mysql上创建hive数据库用于存放hive元数据(即数据字典)
查看mysql数据库软件中都存在哪些数据库,现在一共有6个
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sakila |
| test |
| world |
+--------------------+
6 rows in set (0.01 sec)
打开名为mysql的数据库,只有打开才能操作哦
mysql> use mysql;
Database changed 按回车键出现Database changed 时说明操作成功!
查看现在的数据库中存在什么表,一共有24张
mysql> show tables;
+---------------------------+
| Tables_in_mysql |
+---------------------------+
| columns_priv |
| db |
| event |
| func |
| general_log |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| host |
| ndb_binlog_index |
| plugin |
| proc |
| procs_priv |
| proxies_priv |
| servers |
| slow_log |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| user |
+---------------------------+
24 rows in set (0.01 sec)
创建名为hive的库
mysql> create database hive;
Query OK, 1 row affected (0.00 sec)
查看是否创建成功(查看mysql数据库软件中都存在哪些数据库)
mysql> show database;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'database' at line 1
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive | 这里已经创建hive库成功
| mysql |
| performance_schema |
| sakila |
| test |
| world |
+--------------------+
7 rows in set (0.00 sec)
打开名为hive的数据库
mysql> use hive;
Database changed
mysql> show tables; 检查hive库有哪些表,当然现在还没有创建目前没有表
Empty set (0.00 sec)
三、Hive远程模式安装
1.先把hive-0.8.1.tar.gz包上传到 master:/home/grid 目录下然后解包
[grid@h1 grid]$ tar -zxvf hive-0.8.1.tar.gz
2.把mysql-connector-java-5.1.22-bin.jar包复制到hive的lib目录下
[grid@h1 grid]$ cp mysql-connector-java-5.1.22-bin.jar hive-0.8.1/lib/
3.配置环境变量
[grid@h1 grid]$ vim .bashrc
export HIVE_HOME=/home/grid/hive-0.8.1
4.修改/home/grid/hive-0.8.1/bin/hive-config.sh
[grid@h1 bin]$ pwd
/home/grid/hive-0.8.1/bin
[grid@h1 bin]$ vim hive-config.sh
# hive-config.sh
export JAVA_HOME=/usr/java/jdk1.6.0_25
export HIVE_HOME=/home/grid/hive-0.8.1
export HADOOP_HOME=/home/grid/hadoop-0.20.2 hive捆绑hadoop集群,必须知道hadoop配置文件位置
5.根据hive-default.xml 复制 hive-site.xml 核心配置文件,修改内容
cp hive-default.xml hive-site.xml
[grid@h1 conf]$ vim hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.2.110:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
指出连接mysql的方法
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
Mysq连接驱动程序
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
Mysql登陆用户名
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
Mysql登陆密码
<property>
<name>hive.metastore.local</name>
<value>true</value>这是本地
(<value>false</value>这是远程)
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
解释下远程metastore:这里的false是指hive服务和metastore元数据存储服务不在一个进程中,两个各自独立,它们两个可以放在不同的机器上,如果是true说明这2个服务是合并在一个进程中的,mysql数据库是独立的一个进程,即可在本地机器也可以在远程机器
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
Hive表数据存放目录,每一个表对应一个目录,这是自带默认值
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
<description>Scratch space for Hive jobs</description>
</property>
Hive临时文件存放的目录,这个也不用管,使用默认值即可
这句话是自己添加的,原文里没有
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
</property>
Thrift:是hive的通信协议
127.0.0.1:9083 是指在哪台机器上启动metastore,只有启动metastore才能正常使用hive
6.配置hive-log4j.properties
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter 修改这一行
这里请注意,我修改后启动hive报错,恢复原状就没事了,大家可以试试
7.检查hadoop集群状态
[grid@h1 conf]$ hadoop dfsadmin -report
Configured Capacity: 19865944064 (18.5 GB)
Present Capacity: 8831234048 (8.22 GB)
DFS Remaining: 8816201728 (8.21 GB)
DFS Used: 15032320 (14.34 MB)
DFS Used%: 0.17%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead) 没有问题2个slave运行正常
Name: 192.168.2.103:50010
Decommission Status : Normal
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 7516160 (7.17 MB)
Non DFS Used: 5402759168 (5.03 GB)
DFS Remaining: 4522696704(4.21 GB)
DFS Used%: 0.08%
DFS Remaining%: 45.53%
Last contact: Mon Nov 19 17:51:50 CST 2012
Name: 192.168.2.105:50010
Decommission Status : Normal
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 7516160 (7.17 MB)
Non DFS Used: 5631950848 (5.25 GB)
DFS Remaining: 4293505024(4 GB)
DFS Used%: 0.08%
DFS Remaining%: 43.22%
Last contact: Mon Nov 19 17:51:48 CST 2012
8.修改mysql字符集
mysql> use hive;
Database changed
mysql> alter database hive character set latin1;
Query OK, 1 row affected (0.11 sec)
9.启动hive
[grid@h1 bin]$ pwd
/home/grid/hive-0.8.1/bin
[grid@h1 bin]$ ./hive
[grid@h1 bin]$ ./hive
Logging initialized using configuration in file:/home/grid/hive-0.8.1/conf/hive-log4j.properties
Hive history file=/tmp/grid/hive_job_log_grid_201211191942_411407039.txt
hive> show tables; 显示表,没有创建当然没有
OK
Time taken: 3.2 seconds
hive> create table leo1 (x int,y int); 创建表
OK
Time taken: 1.191 seconds
hive> show tables; 可以显示出来了
OK
leo1
Time taken: 0.148 seconds
已经进入hive模式,还可以创建表
注明:
当你设置的为本地模式时候
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
[grid@h1 bin]$ ./hive
Logging initialized using configuration in file:/home/grid/hive-0.8.1/conf/hive-log4j.properties
Hive history file=/tmp/grid/hive_job_log_grid_201211231243_664195168.txt
hive> show tables;
OK
leo1
Time taken: 21.218 seconds
直接进入hive命令行shell,后台自动启动metestore服务和hive服务而且这2个服务还是同在一个进程中,不用手工启动,这点和远程模式还是有点不同的。
当你设置的为远程metastore模式时候
首先:到运行metastore的主机(与运行hive shell和hiveserver的主机不同)启动metastore服务
grid@slavenode1:~$ hive/bin/hive --service metastore
Starting Hive Metastore Server
正常情况是都在这卡住了,相当于前台启动,正常状况。
其次:在打算运行hive shell的主机(与运行metastore的主机不同)上启动shell ,同时,启动shell时会自动启动hiveserver进程,然后就可正常使用hive
grid@masternode:~$hive
Logging initialized using configuration in file:/home/grid/hive/conf/hive-log4j.properties
Hive history file=/tmp/grid/hive_job_log_grid_201211240852_290794227.txt
hive> show tables;
OK
Time taken: 0.548 seconds
hive>
(
补充关于远程模式的一点总结:
1.hive.metastore.uris指向的是运行metastore服务的主机,并不是指向运行hiveserver的主机,那台主机不用启动hiveserver也ok;
2.直接使用hive命令启动shell环境时,其实已经顺带启动了hiveserver,所以远程模式下其实只需要单独启动metastore,然后就可以进入shell环境正常使用;
3.hiveserver和metastore进程名都叫RunJar。
)
[grid@h1 bin]$ ./hive
Logging initialized using configuration in file:/home/grid/hive-0.8.1/conf/hive-log4j.properties
Hive history file=/tmp/grid/hive_job_log_grid_201211231324_199334603.txt
hive> show tables; 这里显示创建了2个表,这2个表都是建在HDFS文件系统上的
OK mysql只保存hive用户创建表的元数据
leo1
leo2
Time taken: 0.538 seconds
hive> create table leo2 (x int);
OK
Time taken: 0.317 seconds
进入mysql数据库查看hive创建表的元数据
mysql> use hive;
Database changed
mysql> show tables;
+--------------------+
| Tables_in_hive |
+--------------------+
| bucketing_cols |
| cds |
| columns_v2 |
| database_params |
| dbs |
| idxs |
| index_params |
| part_col_privs |
| part_privs |
| partition_key_vals |
| partition_keys |
| partition_params |
| partitions |
| sd_params |
| sds |
| sequence_table |
| serde_params |
| serdes |
| sort_cols |
| table_params |
| tbl_col_privs |
| tbl_privs |
| tbls |
+--------------------+ 红色的就是hive的数据字典表,这些表第一次连接时就已经创建了,后面就会一直存在,只是里面的元数据内容会跟着hive操作而变化。
23 rows in set (0.00 sec)
mysql> select * from tbls;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| 6 | 1353597034 | 1 | 0 | grid | 0 | 6 | leo1 | MANAGED_TABLE | NULL | NULL |
| 11 | 1353648090 | 1 | 0 | grid | 0 | 11 | leo2 | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
2 rows in set (0.00 sec)
hive> drop table leo2; 我删除了 leo2表
OK
Time taken: 1.796 seconds
mysql> select * from tbls; 注意提示符
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| 6 | 1353597034 | 1 | 0 | grid | 0 | 6 | leo1 | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)
Leo2表的元数据也跟着没有了
10.使用web浏览器访问hive
[grid@h1 bin]$ ./hive --service hwi 启动hwi服务
12/11/23 13:50:45 INFO hwi.HWIServer: HWI is starting up
12/11/23 13:50:45 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /home/grid/hive-0.8.1/conf/hive-default.xml
12/11/23 13:50:45 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
12/11/23 13:50:45 INFO mortbay.log: jetty-6.1.14
12/11/23 13:50:46 INFO mortbay.log: Extract jar:file:/home/grid/hive-0.8.1/lib/hive-hwi-0.8.1.war!/ to /tmp/Jetty_192_168_2_102_9999_hive.hwi.0.8.1.war__hwi__.4m0x2o/webapp
12/11/23 13:50:47 INFO mortbay.log: Started SocketConnector@192.168.2.102:9999
12/11/23 13:51:21 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
12/11/23 13:51:21 INFO hive.metastore: Connected to metastore.
12/11/23 13:51:21 ERROR hive.metastore: Unable to shutdown local metastore client
12/11/23 13:51:21 ERROR hive.metastore: [Ljava.lang.StackTraceElement;@16ef71
Hive history file=/tmp/grid/hive_job_log_grid_201211231351_1262762954.txt
这个服务也是前台启动,在这里就卡死了,不用管
启动浏览器在地址栏:http://192.168.2.102:9999/hwi