When I use phantomjs in Scrapy middlewares, it sometimes raises:
当我在破烂的中间件中使用“幻影”时,它有时会增加:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-
packages/scrapy/core/downloader/middleware.py", line 37, in
process_request
response = yield method(request=request, spider=spider)
File "/home/ttc/ruyi-
scrapy/saibolan/saibolan/hz_webdriver_middleware.py", line 47, in
process_request
driver.quit()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 76, in quit
self.service.stop()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 149, in stop
self.send_remote_shutdown_command()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/service.py", line 67, in send_remote_shutdown_command
os.close(self._cookie_temp_file_handle)
OSError: [Errno 9] Bad file descriptor
actually it dont appear every time, I crawl 80 pages and it appears 30 times,and this in phantomjs middlewares
实际上,它每次都不会出现,我爬了80页,它出现了30次,而这是在《幽灵》中。
class HZPhantomjsMiddleware(object):
def __init__(self, settings):
self.phantomjs_driver_path = settings.get('PHANTOMJS_DRIVER_PATH')
self.cloud_mode = settings.get('CLOUD_MODE')
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)
def process_request(self, request, spider):
# 线上需要 display, 本地调试可以注释掉
# if self.cloud_mode:
# display = Display(visible=0, size=(800, 600))
# display.start()
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36")
driver = webdriver.PhantomJS(
self.phantomjs_driver_path, desired_capabilities=dcap)
# chrome_options = webdriver.ChromeOptions()
# prefs = {"profile.managed_default_content_settings.images": 2}
# chrome_options.add_experimental_option("prefs", prefs)
# driver = webdriver.Chrome(self.chrome_driver_path, chrome_options=chrome_options)
driver.get(request.url)
try:
element = WebDriverWait(driver, 15).until(
ec.presence_of_element_located(
(By.XPATH, '//div[@class="txt-box"]|//h4[@class="weui_media_title"]|//div[@class="rich_media_content "]'))
)
body = driver.page_source
time.sleep(1)
driver.quit()
return HtmlResponse(request.url, body=body, encoding='utf-8', request=request)
except:
driver.quit()
spider.logger.error('Ignore request, url: {}'.format(request.url))
raise IgnoreRequest()
I don't know what might lead to this error.
我不知道是什么导致了这个错误。
2 个解决方案
#1
3
As of July 2016, driver.close() and driver.quit() weren't sufficient for me. That killed the node process but not the phantomjs child process it spawned.
截止到2016年7月,司机。close()和driver.quit()对我来说是不够的。这杀死了节点进程,而不是生成的phantomjs子进程。
Following the discussion on this GitHub issue, the single solution that worked for me was to run:
在这个GitHub问题的讨论之后,我的唯一解决方案是:
import signal
driver.service.process.send_signal(signal.SIGTERM) # kill the specific phantomjs child proc
driver.quit() # quit the node proc
#2
0
The problem is described here: https://github.com/SeleniumHQ/selenium/issues/3216. A suggested workaround (specifying cookies file explicitly) worked for me:
这里描述的问题是:https://github.com/seleniumhq/selenium/es/3216。一个建议的解决方案(明确指定cookie文件)为我工作:
driver = webdriver.PhantomJS(self.phantomjs_driver_path, desired_capabilities=dcap, service_args=['--cookies-file=/tmp/cookies.txt'])
#1
3
As of July 2016, driver.close() and driver.quit() weren't sufficient for me. That killed the node process but not the phantomjs child process it spawned.
截止到2016年7月,司机。close()和driver.quit()对我来说是不够的。这杀死了节点进程,而不是生成的phantomjs子进程。
Following the discussion on this GitHub issue, the single solution that worked for me was to run:
在这个GitHub问题的讨论之后,我的唯一解决方案是:
import signal
driver.service.process.send_signal(signal.SIGTERM) # kill the specific phantomjs child proc
driver.quit() # quit the node proc
#2
0
The problem is described here: https://github.com/SeleniumHQ/selenium/issues/3216. A suggested workaround (specifying cookies file explicitly) worked for me:
这里描述的问题是:https://github.com/seleniumhq/selenium/es/3216。一个建议的解决方案(明确指定cookie文件)为我工作:
driver = webdriver.PhantomJS(self.phantomjs_driver_path, desired_capabilities=dcap, service_args=['--cookies-file=/tmp/cookies.txt'])