如何在selenium locators中使用regex

时间:2022-09-12 19:28:41

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:

我正在使用selenium RC,我想要,例如,获得所有具有匹配属性href的链接元素:

http://[^/]*\d+com

I would like to use:

我想使用:

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

which would return a list of the name attribute of all the links that match the regex. (or something like it)

它将返回与regex匹配的所有链接的name属性的列表。(或类似的)

thanks

谢谢

4 个解决方案

#1


10  

The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:

上面的答案可能是找到与regex匹配的所有链接的正确方法,但我认为回答问题的另一部分——如何在Xpath定位器中使用regex——也会有帮助。您需要使用regex匹配()函数,如下所示:

xpath=//div[matches(@id,'che.*boxes')]

(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')

(当然,这将单击带有“id=复选框”或“id=cheANYTHINGHEREboxes”的div)

Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).

但是,请注意,所有Xpath的本地浏览器实现都不支持matches函数(最明显的是,在FF3中使用这个函数会抛出一个错误:无效的Xpath[2])。

If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)

如果您对特定的浏览器有问题(如我对FF3所做的),请尝试使用Selenium的allowNativeXpath(“false”)切换到JavaScript Xpath解释器。它会比较慢,但是它似乎可以使用更多的Xpath函数,包括'matches'和'ends-with'。:)

#2


3  

You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an @ and the attribute name. For example in Java this might be:

您可以使用Selenium命令getAllLinks获取页面上链接的id数组,然后您可以使用getAttribute进行循环并检查href,该属性将使用@和属性名。例如在Java中,这可能是:

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

#3


1  

A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer: selenium: Is it possible to use the regexp in selenium locators

一个可能的解决方案是使用sel.get_eval()并编写一个返回链接列表的JS脚本。类似以下的答案:selenium:是否可能在selenium定位器中使用regexp

#4


0  

Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.

这里还有一些硒RC的替代方法。这些不是纯Selenium解决方案,它们允许与编程语言数据结构和Selenium进行交互。

You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.

您还可以获得HTML页面源,然后正则表达式源返回一组匹配的链接。使用regex分组来分离url、链接文本/ID等,然后可以将它们传回selenium,单击或导航到它。

Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.

另一种方法是获取父/根元素的HTML页面源或innerHTML(通过DOM定位器),然后在编程语言中将HTML转换为XML作为DOM对象。然后,您可以使用所需的XPath(是否使用正则表达式)遍历DOM,并只获取相关链接的节点集。通过解析链接文本/ID或URL,您可以返回到selenium,单击或导航到它。

Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.

应要求,我将提供以下示例。这是一种混合语言,因为无论如何,这篇文章似乎都不是专门针对语言的。我只是用我现有的东西来做例子。它们完全没有经过测试或测试,但是我以前在其他项目中使用过一些代码,所以这些是概念代码示例,说明了如何实现我刚才提到的解决方案。

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

#1


10  

The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:

上面的答案可能是找到与regex匹配的所有链接的正确方法,但我认为回答问题的另一部分——如何在Xpath定位器中使用regex——也会有帮助。您需要使用regex匹配()函数,如下所示:

xpath=//div[matches(@id,'che.*boxes')]

(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')

(当然,这将单击带有“id=复选框”或“id=cheANYTHINGHEREboxes”的div)

Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).

但是,请注意,所有Xpath的本地浏览器实现都不支持matches函数(最明显的是,在FF3中使用这个函数会抛出一个错误:无效的Xpath[2])。

If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)

如果您对特定的浏览器有问题(如我对FF3所做的),请尝试使用Selenium的allowNativeXpath(“false”)切换到JavaScript Xpath解释器。它会比较慢,但是它似乎可以使用更多的Xpath函数,包括'matches'和'ends-with'。:)

#2


3  

You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an @ and the attribute name. For example in Java this might be:

您可以使用Selenium命令getAllLinks获取页面上链接的id数组,然后您可以使用getAttribute进行循环并检查href,该属性将使用@和属性名。例如在Java中,这可能是:

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

#3


1  

A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer: selenium: Is it possible to use the regexp in selenium locators

一个可能的解决方案是使用sel.get_eval()并编写一个返回链接列表的JS脚本。类似以下的答案:selenium:是否可能在selenium定位器中使用regexp

#4


0  

Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.

这里还有一些硒RC的替代方法。这些不是纯Selenium解决方案,它们允许与编程语言数据结构和Selenium进行交互。

You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.

您还可以获得HTML页面源,然后正则表达式源返回一组匹配的链接。使用regex分组来分离url、链接文本/ID等,然后可以将它们传回selenium,单击或导航到它。

Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.

另一种方法是获取父/根元素的HTML页面源或innerHTML(通过DOM定位器),然后在编程语言中将HTML转换为XML作为DOM对象。然后,您可以使用所需的XPath(是否使用正则表达式)遍历DOM,并只获取相关链接的节点集。通过解析链接文本/ID或URL,您可以返回到selenium,单击或导航到它。

Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.

应要求,我将提供以下示例。这是一种混合语言,因为无论如何,这篇文章似乎都不是专门针对语言的。我只是用我现有的东西来做例子。它们完全没有经过测试或测试,但是我以前在其他项目中使用过一些代码,所以这些是概念代码示例,说明了如何实现我刚才提到的解决方案。

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/