I am scraping some data with complex hierarchical info and need to export the result to json.
我正在抓取一些具有复杂层次信息的数据,需要将结果导出到json。
I defined the items as
我将项定义为
class FamilyItem():
name = Field()
sons = Field()
class SonsItem():
name = Field()
grandsons = Field()
class GrandsonsItem():
name = Field()
age = Field()
weight = Field()
sex = Field()
and when the spider runs complete, I will get a printed item output like
当爬行器运行完成时,我将得到一个打印的项目输出,比如
{'name': 'Jenny',
'sons': [
{'name': u'S1',
'grandsons': [
{'name': u'GS1',
'age': 18,
'weight': 50
},
{
'name':u'GS2',
'age': 19,
'weight':51}]
}]
}
but when I run scrapy crawl myscaper -o a.json
, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts...
但是当我跑的时候,我爬的是我的。json总是说结果“不是json可序列化的”。然后我将项目输出复制并粘贴到ipython控制台并使用json.dumps(),它工作得很好。那么问题在哪呢?这真让我抓狂……
2 个解决方案
#1
22
When saving the nested items, make sure to wrap them in a call to dict(), e.g.:
保存嵌套项时,请确保将它们包装在对dict()的调用中,例如:
gs1 = GrandsonsItem()
gs1['name'] = 'GS1'
gs1['age'] = 18
gs1['weight'] = 50
gs2 = GrandsonsItem()
gs2['name'] = 'GS2'
gs2['age'] = 19
gs2['weight'] = 51
s1 = SonsItem()
s1['name'] = 'S1'
s1['grandsons'] = [dict(gs1), dict(gs2)]
jenny = FamilyItem()
jenny['name'] = 'Jenny'
jenny['sons'] = [dict(s1)]
#2
2
Not sure if there's a way to do nested items in scrappy with classes but arrays work fine. You could do something like this:
不确定是否有一种方法可以用类在scrappy中处理嵌套项,但是数组可以很好地工作。你可以这样做:
grandson['name'] = 'Grandson'
grandson['age'] = 2
gransons.append(grandson)
son['name'] = 'Son'
sons['grandson'] = grandsons
sons.append(son)
item.name = 'Name'
item.son = sons
#1
22
When saving the nested items, make sure to wrap them in a call to dict(), e.g.:
保存嵌套项时,请确保将它们包装在对dict()的调用中,例如:
gs1 = GrandsonsItem()
gs1['name'] = 'GS1'
gs1['age'] = 18
gs1['weight'] = 50
gs2 = GrandsonsItem()
gs2['name'] = 'GS2'
gs2['age'] = 19
gs2['weight'] = 51
s1 = SonsItem()
s1['name'] = 'S1'
s1['grandsons'] = [dict(gs1), dict(gs2)]
jenny = FamilyItem()
jenny['name'] = 'Jenny'
jenny['sons'] = [dict(s1)]
#2
2
Not sure if there's a way to do nested items in scrappy with classes but arrays work fine. You could do something like this:
不确定是否有一种方法可以用类在scrappy中处理嵌套项,但是数组可以很好地工作。你可以这样做:
grandson['name'] = 'Grandson'
grandson['age'] = 2
gransons.append(grandson)
son['name'] = 'Son'
sons['grandson'] = grandsons
sons.append(son)
item.name = 'Name'
item.son = sons