currently I am using "yield item" after every item i scrape, though it gives me all the items in one single Json file.
目前我在每个项目后使用“yield item”,虽然它给了我一个Json文件中的所有项目。
1 个解决方案
#1
You can use scrapy-pipeline and from there you can insert each item
into seperate files.
您可以使用scrapy-pipeline,然后您可以将每个项目插入单独的文件中。
I have set a counter
in my spider so that it increments on each item yield and added that value to item
. Using that counter
value I'm creating file names.
我在我的蜘蛛中设置了一个计数器,以便它在每个项目产量上递增并将该值添加到项目中。使用该计数器值我正在创建文件名。
Test_spider.py
class TestSpider(Spider):
# spider name and all
file_counter = 0
def parse(self, response):
# your code here
def parse_item(self, response):
# your code here
self.file_counter += 1
item = Testtem(
#other items,
counter=self.file_counter)
yield item
enable pipeline
in settings.py
by
在settings.py中启用管道
ITEM_PIPELINES = {'test1.pipelines.TestPipeline': 100}
pipelines.py
class TestPipeline(object):
def process_item(self, item, spider):
with open('test_data_%s' % item.get('counter'), 'w') as wr:
item.pop('counter') # remove the counter data, you don't need this in your item
wr.write(str(item))
return item
#1
You can use scrapy-pipeline and from there you can insert each item
into seperate files.
您可以使用scrapy-pipeline,然后您可以将每个项目插入单独的文件中。
I have set a counter
in my spider so that it increments on each item yield and added that value to item
. Using that counter
value I'm creating file names.
我在我的蜘蛛中设置了一个计数器,以便它在每个项目产量上递增并将该值添加到项目中。使用该计数器值我正在创建文件名。
Test_spider.py
class TestSpider(Spider):
# spider name and all
file_counter = 0
def parse(self, response):
# your code here
def parse_item(self, response):
# your code here
self.file_counter += 1
item = Testtem(
#other items,
counter=self.file_counter)
yield item
enable pipeline
in settings.py
by
在settings.py中启用管道
ITEM_PIPELINES = {'test1.pipelines.TestPipeline': 100}
pipelines.py
class TestPipeline(object):
def process_item(self, item, spider):
with open('test_data_%s' % item.get('counter'), 'w') as wr:
item.pop('counter') # remove the counter data, you don't need this in your item
wr.write(str(item))
return item