The situation
这种情况
I have a simple python script to get the HTML source for a given url:
我有一个简单的python脚本,以获取给定url的HTML源代码:
browser = webdriver.PhantomJS()
browser.get(url)
content = browser.page_source
Occasionally, the url points to a page with slow-loading external resources (e.g. video files, or really slow advertising content).
有时,url指向一个加载缓慢的外部资源的页面(例如视频文件,或真正缓慢的广告内容)。
Webdriver will wait until those resources are loaded before completing the .get(url)
request.
在完成.get(url)请求之前,Webdriver将等待这些资源被加载。
Note: For extraneous reasons, I need to do this with PhantomJS rather than requests
or urllib2
注意:出于不必要的原因,我需要使用PhantomJS而不是请求或urllib2
The question
这个问题
I'd like to set a timeout on PhantomJS resource loading so that if the resource is taking too long to load, the browser just assumes it doesn't exist or whatever.
我想在PhantomJS资源加载上设置一个超时,这样如果资源加载时间太长,浏览器就会假设它不存在或者其他什么。
This would allow me to perform the subsequent .pagesource
query based on what the browser has loaded.
这将允许我基于浏览器加载的内容执行后续的.pagesource查询。
Documentation on webdriver.PhantomJS is very thin, and I haven't found a similar question on SO.
webdriver文档。PhantomJS很瘦,我还没发现类似的问题。
thanks in advance!
提前谢谢!
2 个解决方案
#1
11
PhantomJS has provided resourceTimeout
, which might suit your needs. I quote from documentation here
PhantomJS提供了resourceTimeout,可能适合您的需要。我在这里引用文档
(in milli-secs) defines the timeout after which any resource requested will stop trying and proceed with other parts of the page. onResourceTimeout callback will be called on timeout.
(在millis -secs中)定义超时后,请求的任何资源将停止尝试并继续处理页面的其他部分。超时时将调用onResourceTimeout回调。
So in Ruby, you can do something like
在Ruby中,你可以做一些类似的事情
require 'selenium-webdriver'
capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities
I believe in Python, it's something like (untested, only provides the logic, you are the Python developer, hopefully you will figure out)
我相信Python,它是(未经测试,只提供逻辑,您是Python开发人员,希望您能理解)
driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})
#2
13
Long Explanation below, so TLDR:
下面的解释很长,所以TLDR:
Current version of Selenium's Ghostdriver (in PhantomJS 1.9.8) ignores resourceTimeout option, use webdriver's implicitly_wait(), set_page_load_timeout() and wrap them under try-except block.
当前版本的Selenium的Ghostdriver(在PhantomJS 1.9.8中)忽略resourceTimeout选项,使用webdriver的implicitly_wait()、set_page_load_timeout(),并将它们包装到块下(block除外)。
#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
browser.get("http://url_here")
except TimeoutException as e:
#Handle your exception here
print(e)
finally:
browser.quit()
Explanation
解释
To provide PhantomJS page settings to Selenium, one can use webdriver's DesiredCapabilities such as:
要为Selenium提供PhantomJS页面设置,可以使用webdriver需要的功能,比如:
#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)
//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);
But, here's the catch: As in today (2014/Dec/11) with PhantomJS 1.9.8 and its embedded Ghostdriver, resourceTimeout won't be applied by Ghostdriver (See the Ghostdriver issue#380 in Github).
但是,问题是:就像今天(2014/ 12 /11)的PhantomJS 1.9.8及其嵌入的Ghostdriver一样,resourceTimeout不会被Ghostdriver应用(参见Github上的Ghostdriver问题#380)。
For a workaround, simply use Selenium's timeout functions/methods and wrap webdriver's get method in a try-except/try-catch block, e.g.
对于解决方案,只需使用Selenium的超时函数/方法,并在try-except/try-catch块中包装webdriver的get方法,例如。
#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
browser.get("http://url_here")
except TimeoutException as e:
#Handle your exception here
print(e)
finally:
browser.quit()
//Java
WebDriver webdriver = new PhantomJSDriver();
webdriver.manage().timeouts()
.pageLoadTimeout(3, TimeUnit.SECONDS)
.implicitlyWait(3, TimeUnit.SECONDS);
try {
webdriver.get("http://url_here");
} catch (org.openqa.selenium.TimeoutException e) {
//Handle your exception here
System.out.println(e.getMessage());
} finally {
webdriver.quit();
}
#1
11
PhantomJS has provided resourceTimeout
, which might suit your needs. I quote from documentation here
PhantomJS提供了resourceTimeout,可能适合您的需要。我在这里引用文档
(in milli-secs) defines the timeout after which any resource requested will stop trying and proceed with other parts of the page. onResourceTimeout callback will be called on timeout.
(在millis -secs中)定义超时后,请求的任何资源将停止尝试并继续处理页面的其他部分。超时时将调用onResourceTimeout回调。
So in Ruby, you can do something like
在Ruby中,你可以做一些类似的事情
require 'selenium-webdriver'
capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities
I believe in Python, it's something like (untested, only provides the logic, you are the Python developer, hopefully you will figure out)
我相信Python,它是(未经测试,只提供逻辑,您是Python开发人员,希望您能理解)
driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})
#2
13
Long Explanation below, so TLDR:
下面的解释很长,所以TLDR:
Current version of Selenium's Ghostdriver (in PhantomJS 1.9.8) ignores resourceTimeout option, use webdriver's implicitly_wait(), set_page_load_timeout() and wrap them under try-except block.
当前版本的Selenium的Ghostdriver(在PhantomJS 1.9.8中)忽略resourceTimeout选项,使用webdriver的implicitly_wait()、set_page_load_timeout(),并将它们包装到块下(block除外)。
#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
browser.get("http://url_here")
except TimeoutException as e:
#Handle your exception here
print(e)
finally:
browser.quit()
Explanation
解释
To provide PhantomJS page settings to Selenium, one can use webdriver's DesiredCapabilities such as:
要为Selenium提供PhantomJS页面设置,可以使用webdriver需要的功能,比如:
#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)
//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);
But, here's the catch: As in today (2014/Dec/11) with PhantomJS 1.9.8 and its embedded Ghostdriver, resourceTimeout won't be applied by Ghostdriver (See the Ghostdriver issue#380 in Github).
但是,问题是:就像今天(2014/ 12 /11)的PhantomJS 1.9.8及其嵌入的Ghostdriver一样,resourceTimeout不会被Ghostdriver应用(参见Github上的Ghostdriver问题#380)。
For a workaround, simply use Selenium's timeout functions/methods and wrap webdriver's get method in a try-except/try-catch block, e.g.
对于解决方案,只需使用Selenium的超时函数/方法,并在try-except/try-catch块中包装webdriver的get方法,例如。
#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
browser.get("http://url_here")
except TimeoutException as e:
#Handle your exception here
print(e)
finally:
browser.quit()
//Java
WebDriver webdriver = new PhantomJSDriver();
webdriver.manage().timeouts()
.pageLoadTimeout(3, TimeUnit.SECONDS)
.implicitlyWait(3, TimeUnit.SECONDS);
try {
webdriver.get("http://url_here");
} catch (org.openqa.selenium.TimeoutException e) {
//Handle your exception here
System.out.println(e.getMessage());
} finally {
webdriver.quit();
}