I'm having problems while parsing a JSON with python, and now I'm stuck.
The problem is that the entities of my JSON are not always the same. The JSON is something like:
我在用python解析JSON时遇到了问题,现在我陷入了困境。问题是JSON的实体并不总是相同的。JSON是这样的:
"entries":[
{
"summary": "here is the sunnary",
"extensions": {
"coordinates":"coords",
"address":"address",
"name":"name"
"telephone":"123123"
"url":"www.blablablah"
},
}
]
I can move through the JSON, for example:
我可以移动JSON,例如:
for entrie in entries:
name =entrie['extensions']['name']
tel=entrie['extensions']['telephone']
The problem comes because sometimes, the JSON does not have all the "fields", for example, the telephone
field, sometimes is missing, so, the script fails with KeyError
, because the key telephone is missing in this entry.
So, my question: how could I run this script, leaving a blank space where telephone is missing? I've tried with:
问题出现的原因是,有时JSON没有所有的“字段”,例如,电话字段,有时会丢失,因此,脚本在KeyError中失败,因为该条目中缺少键电话。因此,我的问题是:如何运行这个脚本,在缺少电话的地方留下一个空白?我试过了:
if entrie['extensions']['telephone']:
tel=entrie['extensions']['telephone']
but I think is not ok.
但我认为这不太好。
4 个解决方案
#1
11
Use dict.get
instead of []
:
使用dict.get而不是[]:
entries['extensions'].get('telephone', '')
Or, simply:
或者,只是:
entries['extensions'].get('telephone')
get
will return the second argument (default, None
) instead of raising a KeyError
when the key is not found.
get将返回第二个参数(默认值为None),而不会在未找到密钥时引发KeyError。
#2
8
If the data is missing in only one place, then dict.get can be used to fill-in missing the missing value:
如果数据仅在一个地方丢失,则可以使用dict.get来填充缺失值:
tel = d['entries'][0]['extensions'].get('telelphone', '')
If the problem is more widespread, you can have the JSON parser use a defaultdict or custom dictionary instead of a regular dictionary. For example, given the JSON string:
如果问题更普遍,可以让JSON解析器使用defaultdict或自定义字典,而不是常规字典。例如,给定JSON字符串:
json_txt = '''{
"entries": [
{
"extensions": {
"telephone": "123123",
"url": "www.blablablah",
"name": "name",
"coordinates": "coords",
"address": "address"
},
"summary": "here is the summary"
}
]
}'''
Parse it with:
解析:
>>> class BlankDict(dict):
def __missing__(self, key):
return ''
>>> d = json.loads(json_txt, object_hook=BlankDict)
>>> d['entries'][0]['summary']
u'here is the summary'
>>> d['entries'][0]['extensions']['color']
''
As a side note, if you want to clean-up your datasets and enforce consistency, there is a fine tool called Kwalify that does schema validation on JSON (and on YAML);
作为补充说明,如果您想清理数据集并加强一致性,有一个很好的工具叫做Kwalify,它在JSON(和YAML)上进行模式验证;
#3
0
There are several useful dictionary features that you can use to work with this.
您可以使用一些有用的字典特性来处理这个问题。
First off, you can use in
to test whether or not a key exists in a dictionary:
首先,您可以使用in来测试字典中是否存在键:
if 'telephone' in entrie['extensions']:
tel=entrie['extensions']['telephone']
get
might also be useful; it allows you to specify a default value if the key is missing:
get可能也是有用的;它允许您指定一个默认值,如果密钥缺失:
tel=entrie['extensions'].get('telephone', '')
Beyond that, you could look into the standard library's collections.defaultdict
, but that might be overkill.
除此之外,您还可以查看标准库的集合。
#4
0
Two ways.
两种方式。
One is to make sure that your dictionaries are standard, and when you read them in they have all fields. The other is to be careful when accessing the dictionaries.
一个是确保你的字典是标准的,当你读它们的时候,它们有所有的字段。另一个是在访问字典时要小心。
Here is an example of making sure your dictionaries are standard:
这里有一个确保你的字典是标准的例子:
__reference_extensions = {
# fill in with all standard keys
# use some default value to go with each key
"coordinates" : '',
"address" : '',
"name" : '',
"telephone" : '',
"url" : ''
}
entrie = json.loads(input_string)
d = entrie["extensions"]
for key, value in __reference_extensions:
if key not in d:
d[key] = value
Here is an example of being careful when accessing the dictionaries:
以下是一个在查阅字典时小心谨慎的例子:
for entrie in entries:
name = entrie['extensions'].get('name', '')
tel = entrie['extensions'].get('telephone', '')
#1
11
Use dict.get
instead of []
:
使用dict.get而不是[]:
entries['extensions'].get('telephone', '')
Or, simply:
或者,只是:
entries['extensions'].get('telephone')
get
will return the second argument (default, None
) instead of raising a KeyError
when the key is not found.
get将返回第二个参数(默认值为None),而不会在未找到密钥时引发KeyError。
#2
8
If the data is missing in only one place, then dict.get can be used to fill-in missing the missing value:
如果数据仅在一个地方丢失,则可以使用dict.get来填充缺失值:
tel = d['entries'][0]['extensions'].get('telelphone', '')
If the problem is more widespread, you can have the JSON parser use a defaultdict or custom dictionary instead of a regular dictionary. For example, given the JSON string:
如果问题更普遍,可以让JSON解析器使用defaultdict或自定义字典,而不是常规字典。例如,给定JSON字符串:
json_txt = '''{
"entries": [
{
"extensions": {
"telephone": "123123",
"url": "www.blablablah",
"name": "name",
"coordinates": "coords",
"address": "address"
},
"summary": "here is the summary"
}
]
}'''
Parse it with:
解析:
>>> class BlankDict(dict):
def __missing__(self, key):
return ''
>>> d = json.loads(json_txt, object_hook=BlankDict)
>>> d['entries'][0]['summary']
u'here is the summary'
>>> d['entries'][0]['extensions']['color']
''
As a side note, if you want to clean-up your datasets and enforce consistency, there is a fine tool called Kwalify that does schema validation on JSON (and on YAML);
作为补充说明,如果您想清理数据集并加强一致性,有一个很好的工具叫做Kwalify,它在JSON(和YAML)上进行模式验证;
#3
0
There are several useful dictionary features that you can use to work with this.
您可以使用一些有用的字典特性来处理这个问题。
First off, you can use in
to test whether or not a key exists in a dictionary:
首先,您可以使用in来测试字典中是否存在键:
if 'telephone' in entrie['extensions']:
tel=entrie['extensions']['telephone']
get
might also be useful; it allows you to specify a default value if the key is missing:
get可能也是有用的;它允许您指定一个默认值,如果密钥缺失:
tel=entrie['extensions'].get('telephone', '')
Beyond that, you could look into the standard library's collections.defaultdict
, but that might be overkill.
除此之外,您还可以查看标准库的集合。
#4
0
Two ways.
两种方式。
One is to make sure that your dictionaries are standard, and when you read them in they have all fields. The other is to be careful when accessing the dictionaries.
一个是确保你的字典是标准的,当你读它们的时候,它们有所有的字段。另一个是在访问字典时要小心。
Here is an example of making sure your dictionaries are standard:
这里有一个确保你的字典是标准的例子:
__reference_extensions = {
# fill in with all standard keys
# use some default value to go with each key
"coordinates" : '',
"address" : '',
"name" : '',
"telephone" : '',
"url" : ''
}
entrie = json.loads(input_string)
d = entrie["extensions"]
for key, value in __reference_extensions:
if key not in d:
d[key] = value
Here is an example of being careful when accessing the dictionaries:
以下是一个在查阅字典时小心谨慎的例子:
for entrie in entries:
name = entrie['extensions'].get('name', '')
tel = entrie['extensions'].get('telephone', '')