前言:
利用Python数据转换的套路可以遵循:变量定义的位置,字典操作,列表操作,这个三部分的内容可以处理大部分的数据相关需求。
1.下面我们先看这个脚本:
#从字典转换为Json的方法
from distutils.log import warn as printf
from json import dumps
from pprint import pprint
BOOKs = {
'0132269937': {
'title': 'Core Python Programming',
'edition': 2,
'year': 2007,
},
'0132356139': {
'title': 'Python Web Development with Django',
'authors': ['Jeff Forcier', 'Paul Bissex', 'Wesley Chun'],
'year': 2009,
},
'0137143419': {
'title': 'Python Fundamentals',
'year': 2009,
},
}
printf('*** RAW DICT ***')
printf(BOOKs)
printf('\n*** PRETTY_PRINTED DICT ***')
pprint(BOOKs)
printf('\n*** RAW JSON ***')
printf(dumps(BOOKs))
printf('\n*** PRETTY_PRINTED JSON ***')
printf(dumps(BOOKs, indent=4))
输出结果:
"E:\Anaconda3 4.2.0\python.exe" E:/Pycharm/Python-code/dict2json.py
*** RAW DICT ***
{'0132269937': {'edition': 2, 'title': 'Core Python Programming', 'year': 2007},
'0132356139': {'authors': ['Jeff Forcier', 'Paul Bissex', 'Wesley Chun'],
{'0137143419': {'year': 2009, 'title': 'Python Fundamentals'}, '0132356139': {'year': 2009, 'authors': ['Jeff Forcier', 'Paul Bissex', 'Wesley Chun'], 'title': 'Python Web Development with Django'}, '0132269937': {'year': 2007, 'edition': 2, 'title': 'Core Python Programming'}}
'title': 'Python Web Development with Django',
'year': 2009},
*** PRETTY_PRINTED DICT ***
'0137143419': {'title': 'Python Fundamentals', 'year': 2009}}
*** RAW JSON ***
{"0137143419": {"year": 2009, "title": "Python Fundamentals"}, "0132356139": {"year": 2009, "authors": ["Jeff Forcier", "Paul Bissex", "Wesley Chun"], "title": "Python Web Development with Django"}, "0132269937": {"year": 2007, "edition": 2, "title": "Core Python Programming"}}
*** PRETTY_PRINTED JSON ***
{
"0137143419": {
"year": 2009,
"title": "Python Fundamentals"
},
"0132356139": {
"year": 2009,
"authors": [
"Jeff Forcier",
"Paul Bissex",
"Wesley Chun"
],
"title": "Python Web Development with Django"
},
"0132269937": {
"year": 2007,
"edition": 2,
"title": "Core Python Programming"
}
}
Process finished with exit code 0
首先导入所需要的三个函数:1)导入distutils.log.warn()用来应对python2中print语句和python3中print()语句引起的差异;2)json.dumps(),用来返回一个表示python对象的字符串;pprint.pprint(),用来美观地输出python的对象。
BOOKs数据结构是一个python字典,这里没有用列表这样扁平的数据结构,是因为字典可以构建结构化层次的属性(BOOKs表示通过ISBN标识的书籍还具备额外的信息:书名、作者、出版年份)。值得注意的是,在等价的json表示方法中会移除所有额外的逗号。
Python的Json模块序列化与反序列化的过程分别是 encoding和 decoding。encoding-把一个Python对象编码转换成Json字符串;decoding-把Json格式字符串解码转换成Python对象。要使用json模块必须先import json
Json的导入导出
用write/dump是将Json对象输入到一个python_object中,如果python_object是文件,则dump到文件中;如果是对象,则dump到内存中。这是序列化
2.纵向数据转换为横向数据
1.情况:由于目前spark直接生成的json是每行一个对象,类似以下的json数据格式
[
{
"cardno": 100000026235,
"trdate": "2015-12-25",
"otime": "16:13:33",
"dtime": "16:21:10",
"osite": 16,
"dsite": 15,
"tfc": 1
}]
2.需求:转换成Json column arrays 数组格式 [{},{}]如下
{'cardno': [100006734923], 'trdate': ['2015-12-25'], 'dtime': ['17:56:45'], 'dsite': [40], 'osite': [41], 'otime': ['17:50:11'], 'tfc': [1]}
3.Python代码实现:
import sys
import json
with open(r'D:/data.json', 'r') as f:
data = json.load(f)
# test = {
# "cardno": 100006734923,
# "trdate": "2015-12-25",
# "otime": "17:50:11",
# "dtime": "17:56:45",
# "osite": 41,
# "dsite": 40,
# "tfc": 1
# }
result = {"cardno": [], "trdate":[], "otime":[],"dtime":[],"osite":[],"dsite":[],"tfc":[]}
for test in data:
for a in test.keys():
result[a].append(test[a]);
print(result)
切换本地文件路径转换。
待我学有所成,结发与蕊可好。@夏瑾墨 by Jooey