python导出hive数据表的schema实例代码

本文研究的主要问题是python语言导出hive数据表的schema，分享了实现代码，具体如下。

为了避免运营提出无穷无尽的查询需求，我们决定将有查询价值的数据从mysql导入hive中，让他们使用HUE这个开源工具进行查询。想必他们对表结构不甚了解，还需要为之提供一个表结构说明，于是编写了一个脚本，从hive数据库中将每张表的字段即类型查询出来，代码如下：

				?

									#coding=utf-8 

									import pyhs2 

									from xlwt import *

									hiveconn = pyhs2.connect(host='10.46.77.120', 

									         port=10000, 

									         authMechanism='PLAIN', 

									         user='hadoop', 

									         database='hibiscus_data', 

									         ) 

									def create_excel(): 

									  sql = 'show tables'

									  tables = [] 

									  with hiveconn.cursor() as cursor: 

									    cursor.execute(sql) 

									    res = cursor.fetch() 

									    for table in res: 

									      tables.append(table[0]) 

									  tableinfo = [] 

									  for table in tables: 

									    tableinfo.append(get_column_info(table)) 

									  create_excel_ex(tableinfo) 

									def create_excel_ex(tableinfo): 

									  w = Workbook() 

									  sheet = w.add_sheet(u'表结构') 

									  row = 0

									  for info in tableinfo: 

									    row = write_tale_info(info,sheet,row) 

									  w.save('hive_schema.xls') 

									def write_tale_info(tableinfo,sheet,row): 

									  print row 

									  sheet.write_merge(row,row,0,2,tableinfo['table']) 

									  row += 1

									  sheet.write(row,0,u'名称') 

									  sheet.write(row,1,u'类型') 

									  sheet.write(row,2,u'解释') 

									  row += 1

									  fields = tableinfo['fields'] 

									  for field in fields: 

									    sheet.write(row,0,field['name']) 

									    sheet.write(row,1,field['type']) 

									    row += 1

									  return row + 1

									def get_column_info(table): 

									  sql = 'desc {table}'.format(table=table) 

									  info = {'table':table,'fields':[]} 

									  with hiveconn.cursor() as cursor: 

									    cursor.execute(sql) 

									    res = cursor.fetch() 

									    for item in res: 

									      if item[0] == '': 

									        break

									      info['fields'].append({'name':item[0],'type':item[1]}) 

									  return info 

									if __name__ == '__main__': 

									  create_excel()

其实，我们的hive数据库将所有的元数据存储在了mysql当中，分析这些元数据也可以获得表结构信息。

总结

以上就是本文关于python导出hive数据表的schema实例代码的全部内容，希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题，如有不足之处，欢迎留言指出。感谢朋友们对本站的支持！

原文链接：http://blog.csdn.net/kwsy2008/article/details/52041811

秒客网

python导出hive数据表的schema实例代码

相关文章