Hive操作案例

idea/dg远程连接

hive的详细安装不多展示，自行搜索即可。
依次启动zookeeper，hadoop
在zookeeper的节点上启动如下指令（我的是1个主节点和2个备用节点）
启动Hive的metastore（存储和管理元数据的服务）和hiveserver2（远程连接服务）

nohup hive --service metastore >  /root/training/apache-hive-3.1.3-bin/logs/metastore.log 2>&1 &
nohup hive --service hiveserver2 > /root/training/apache-hive-3.1.3-bin/logs/hiveserver2.log 2>&1 &

远程连接方面，以idea为例（datagrip在idea有集成）
在这里插入图片描述
注意下端口和用户即可，其他的没什么特别需要注意的地方。

报错

[08S01][1] Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode=“/user/hive/warehouse”:root:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:506) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:346) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:3 …

连接完成后敲代码，遇到这样的错误表明：当前用户操作的hdfs目录（hdfs中的hive目录）权限不够
我这里是匿名用户，没有写的权限，因此需要修改

解决
在虚拟机上修改hadoop中的warehouse目录下的权限，这也是hive的目录
这个目录修改为你自己的即可，具体看报错内容

hdfs dfs -chmod 777 /user/hive/warehouse

导入数据

自己建立一个csv文件（数据描述）
由于虚拟机大小限制，自己重新做了个。
关于csv文件自行编写即可

建表

创建表user_log和user_info。

create table user_info(
	id int comment "唯一表示id",
	age_range int comment "年龄范围",
	gender  int comment "性别 0女 1男 2保密"
)
row format delimited
fields terminated by ","
lines terminated by "\n";


create table user_log(
	user_id int comment "买家id",
	item_id int comment "产品id",
	cat_id int comment "分类id",
	seller_id int comment "卖家id",
	brand_id int comment "品牌id",
	time_stamp bigint comment "时间戳",
	action_type int
)
row format delimited
fields terminated by ","
lines terminated by "\n";

数据导入

值得说明的是，csv文件首行（列名描述）应当删去，不然导入数据时会出现首行因数据类型不一致而出现空的情况
在这里插入图片描述
上传数据至hive中（用hive的客户端）

load data local inpath '/root/tools/user_log.csv' into table user_log;
load data local inpath '/root/tools/user_info.csv' into table user_info;

目录

idea/dg远程连接

导入数据

建表

数据导入

相关文章