I try to read an Openstreetmaps API output JSON string, which is valid.
我尝试读取一个有效的Openstreetmaps API输出JSON字符串。
I am using following code:
我使用以下代码:
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdataframe = pd.read_json(osmdata)
which throws following error:
抛出以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
197 keep_default_dates, numpy, precise_float,
--> 198 date_unit).parse()
199
200 if typ == 'series' or obj is None:
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
264
265 else:
--> 266 self._parse_no_numpy()
267
268 if self.obj is None:
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
481 if orient == "columns":
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
485 decoded = dict((str(k), v)
TypeError: Expected String or Unicode
How to modify the request or Pandas read_json
, to avoid an error? By the way, what's the problem?
如何修改请求或Pandas read_json,以避免错误?顺便问一下,问题是什么?
1 个解决方案
#1
12
If you print the json string to a file,
如果您将json字符串打印到文件,
content = osm.read()
with open('/tmp/out', 'w') as f:
f.write(content)
you'll see something like this:
你会看到这样的东西:
{
"version": 0.6,
"generator": "Overpass API",
"osm3s": {
"timestamp_osm_base": "2014-07-20T07:52:02Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
},
...]}
If the JSON string were to be converted to a Python object, it would be a dict whose elements
key is a list of dicts. The vast majority of the data is inside this list of dicts.
如果要将JSON字符串转换为Python对象,那么它将是一个字典,其元素键是一个dicts列表。绝大多数数据都在这个词典列表中。
This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator']
to be the columns, since almost all the information is in the elements
list-of-dicts.
此JSON字符串不能直接转换为Pandas对象。什么是索引,列是什么?当然,你不希望[u'elements',u'version',u'osm3s',u'generator']成为专栏,因为几乎所有的信息都在元素列表中。
But if you want the DataFrame to consist of the data only in the elements
list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.
但是如果你想让DataFrame只包含元素list-of-dicts中的数据,那么你必须指定,因为Pandas不能为你做出这样的假设。
Further complicating things is that each dict in elements
is a nested dict. Consider the first dict in elements
:
更复杂的是元素中的每个字典都是嵌套的字典。考虑元素中的第一个字典:
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
}
Should ['lat', 'lon', 'type', 'id', 'tags']
be the columns? That seems plausible, except that the tags
column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags
dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.
应该['lat','lon','type','id','tags']列?这似乎是合理的,除了标签列最终会成为一列dicts。这通常不是很有用。如果标签dict中的键被制成列,那也许会更好。我们可以做到这一点,但我们必须自己编码,因为熊猫无法知道我们想要的东西。
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
for key, val in dct['tags'].iteritems():
dct[key] = val
del dct['tags']
osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())
yields
lat lon name
0 50.984926 13.682178 Niederhäslich Bergmannsweg
1 51.123623 13.782789 Sagarder Weg
2 51.065752 13.895734 Weißig, Einkaufszentrum
3 51.007140 13.698498 Stuttgarter Straße
4 51.010199 13.701411 Heilbronner Straße
#1
12
If you print the json string to a file,
如果您将json字符串打印到文件,
content = osm.read()
with open('/tmp/out', 'w') as f:
f.write(content)
you'll see something like this:
你会看到这样的东西:
{
"version": 0.6,
"generator": "Overpass API",
"osm3s": {
"timestamp_osm_base": "2014-07-20T07:52:02Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
},
...]}
If the JSON string were to be converted to a Python object, it would be a dict whose elements
key is a list of dicts. The vast majority of the data is inside this list of dicts.
如果要将JSON字符串转换为Python对象,那么它将是一个字典,其元素键是一个dicts列表。绝大多数数据都在这个词典列表中。
This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator']
to be the columns, since almost all the information is in the elements
list-of-dicts.
此JSON字符串不能直接转换为Pandas对象。什么是索引,列是什么?当然,你不希望[u'elements',u'version',u'osm3s',u'generator']成为专栏,因为几乎所有的信息都在元素列表中。
But if you want the DataFrame to consist of the data only in the elements
list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.
但是如果你想让DataFrame只包含元素list-of-dicts中的数据,那么你必须指定,因为Pandas不能为你做出这样的假设。
Further complicating things is that each dict in elements
is a nested dict. Consider the first dict in elements
:
更复杂的是元素中的每个字典都是嵌套的字典。考虑元素中的第一个字典:
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
}
Should ['lat', 'lon', 'type', 'id', 'tags']
be the columns? That seems plausible, except that the tags
column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags
dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.
应该['lat','lon','type','id','tags']列?这似乎是合理的,除了标签列最终会成为一列dicts。这通常不是很有用。如果标签dict中的键被制成列,那也许会更好。我们可以做到这一点,但我们必须自己编码,因为熊猫无法知道我们想要的东西。
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
for key, val in dct['tags'].iteritems():
dct[key] = val
del dct['tags']
osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())
yields
lat lon name
0 50.984926 13.682178 Niederhäslich Bergmannsweg
1 51.123623 13.782789 Sagarder Weg
2 51.065752 13.895734 Weißig, Einkaufszentrum
3 51.007140 13.698498 Stuttgarter Straße
4 51.010199 13.701411 Heilbronner Straße