I am interested in using selenium with R. I note that the various documentation is described here WebDriver (Selenium 2) API documentation. Has there been any work done on an implementation with R. How would I go about approaching this. In the documentation it notes about running a selenium server and one can query the api using Javascript. Any help would be much appreciated.
我有兴趣在R中使用selenium。我注意到WebDriver(Selenium 2)API文档中描述了各种文档。是否已就R的实施做了任何工作。我将如何处理这个问题。在文档中,它记录了运行selenium服务器,并且可以使用Javascript查询api。任何帮助将非常感激。
3 个解决方案
#1
1
Selenium can be accessed using the JsonWireProtocol.
可以使用JsonWireProtocol访问Selenium。
Firstly start up a Selenium server from the command line via:
首先从命令行启动Selenium服务器:
java -jar selenium-server-standalone-2.25.0.jar
a new Firefox browser can be opened as follows:
可以按如下方式打开新的Firefox浏览器:
library(RCurl)
library(RJSONIO)
library(XML)
baseURL<-"http://localhost:4444/wd/hub/"
server<-list(desiredCapabilities=list(browserName='firefox',javascriptEnabled=TRUE))
getURL(paste0(baseURL,"session"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(server))
serverDetails<-fromJSON(rawToChar(getURLContent('http://localhost:4444/wd/hub/sessions',binary=TRUE)))
serverId<-serverDetails$value[[1]]$id
Navigate to google.
导航到谷歌。
getURL(paste0(baseURL,"session/",serverId,"/url"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(url="http://www.google.com")))
get the id of the search box
获取搜索框的ID
elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(using="xpath",value="//*[@id=\"gbqfq\"]")),binary=TRUE))
)
elementId<-elementDetails$value
search for a subject matter
搜索主题
rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element/",elementId,"/value"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(value=list("\uE009","a","\uE009",'\b','Selenium api in R')))
,binary=TRUE))
return the search html
返回搜索html
googData<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/source"),
customrequest="GET",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
binary=TRUE
))
)
get the suggested links
获取建议的链接
gxml<-htmlParse(googData$value)
urls<-unname(xpathSApply(gxml,"//*[@class='l']/@href"))
close the session
关闭会议
getURL(paste0(baseURL,"session/",serverId),
customrequest="DELETE",
httpheader=c('Content-Type'='application/json;charset=UTF-8')
)
#2
0
I would like to scrap soccer matches tables of every single round but dont know how to do, I do appreciate if you're willing to shade me a light... Using R to connect Selenium-Server-Standalone
我想废弃每一轮的足球比赛桌,但不知道该怎么做,如果你愿意给我一个灯光,我很感激...用R连接Selenium-Server-Standalone
elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(using="xpath",value="//*[@id="Match_Table"]")),binary=TRUE))
)
elementId<-elementDetails$value
#3
0
The package relenium (Selenium for R) has been recently developed, importing selenium through the rJava package. It is mainly proposed for webscraping. Disclaimer: I'm one of the developers.
最近开发了包装硒(Selenium for R),通过rJava包装进口硒。它主要用于webscraping。免责声明:我是开发人员之一。
#1
1
Selenium can be accessed using the JsonWireProtocol.
可以使用JsonWireProtocol访问Selenium。
Firstly start up a Selenium server from the command line via:
首先从命令行启动Selenium服务器:
java -jar selenium-server-standalone-2.25.0.jar
a new Firefox browser can be opened as follows:
可以按如下方式打开新的Firefox浏览器:
library(RCurl)
library(RJSONIO)
library(XML)
baseURL<-"http://localhost:4444/wd/hub/"
server<-list(desiredCapabilities=list(browserName='firefox',javascriptEnabled=TRUE))
getURL(paste0(baseURL,"session"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(server))
serverDetails<-fromJSON(rawToChar(getURLContent('http://localhost:4444/wd/hub/sessions',binary=TRUE)))
serverId<-serverDetails$value[[1]]$id
Navigate to google.
导航到谷歌。
getURL(paste0(baseURL,"session/",serverId,"/url"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(url="http://www.google.com")))
get the id of the search box
获取搜索框的ID
elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(using="xpath",value="//*[@id=\"gbqfq\"]")),binary=TRUE))
)
elementId<-elementDetails$value
search for a subject matter
搜索主题
rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element/",elementId,"/value"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(value=list("\uE009","a","\uE009",'\b','Selenium api in R')))
,binary=TRUE))
return the search html
返回搜索html
googData<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/source"),
customrequest="GET",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
binary=TRUE
))
)
get the suggested links
获取建议的链接
gxml<-htmlParse(googData$value)
urls<-unname(xpathSApply(gxml,"//*[@class='l']/@href"))
close the session
关闭会议
getURL(paste0(baseURL,"session/",serverId),
customrequest="DELETE",
httpheader=c('Content-Type'='application/json;charset=UTF-8')
)
#2
0
I would like to scrap soccer matches tables of every single round but dont know how to do, I do appreciate if you're willing to shade me a light... Using R to connect Selenium-Server-Standalone
我想废弃每一轮的足球比赛桌,但不知道该怎么做,如果你愿意给我一个灯光,我很感激...用R连接Selenium-Server-Standalone
elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(using="xpath",value="//*[@id="Match_Table"]")),binary=TRUE))
)
elementId<-elementDetails$value
#3
0
The package relenium (Selenium for R) has been recently developed, importing selenium through the rJava package. It is mainly proposed for webscraping. Disclaimer: I'm one of the developers.
最近开发了包装硒(Selenium for R),通过rJava包装进口硒。它主要用于webscraping。免责声明:我是开发人员之一。