如何获取谷歌搜索结果

时间:2022-08-28 08:34:14

I used the following code:

我使用了以下代码:

library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) 
    {
    search.term <- gsub(' ', '%20', search.term)
    if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=',
        search.term, sep='')
    }

    getGoogleLinks <- function(google.url) 
    {
       doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
       html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
       nodes <- getNodeSet(html, "//a[@href][@class='l']")
       return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]))
    }

search.term <- "cran"
quotes <- "FALSE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)

links <- getGoogleLinks(search.url)

I would like to find all the links that resulted from my search and I get the following result:

我想找到搜索产生的所有链接,我得到以下结果:

> links
list()

How can I get the links? In addition I would like to get the headlines and summary of google results how can I get it? And finally is there a way to get the links that resides in ChillingEffects.org results?

我怎样才能获得链接?另外我想获得谷歌搜索结果的头条和摘要如何才能获得它?最后是否有办法获取ChillingEffects.org结果中的链接?

2 个解决方案

#1


8  

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

如果查看html变量,可以看到搜索结果链接全部嵌套在

标记中。

Try to change your getGoogleLinks function to:

尝试将getGoogleLinks功能更改为:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
                                             (2.10.0)"))
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
                          (...){})
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}

#2


4  

I created this function to read in a list of company names and then get the top website result for each. It will get you started then you can adjust it as needed.

我创建了这个函数来读取公司名称列表,然后获得每个公司名称的*网站结果。它会让你开始,然后你可以根据需要调整它。

#libraries.
library(URLencode)
library(rvest)

#load data
d <-read.csv("P:\\needWebsites.csv")
c <- as.character(d$Company.Name)

# Function for getting website.
getWebsite <- function(name)
{
    url = URLencode(paste0("https://www.google.com/search?q=",name))

    page <- read_html(url)

    results <- page %>% 
      html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types.
      html_text()

    result <- results[1]

    return(as.character(result)) # Return results if you want to see them all.
}

# Apply the function to a list of company names.
websites <- data.frame(Website = sapply(c,getWebsite))]

#1


8  

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

如果查看html变量,可以看到搜索结果链接全部嵌套在

标记中。

Try to change your getGoogleLinks function to:

尝试将getGoogleLinks功能更改为:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
                                             (2.10.0)"))
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
                          (...){})
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}

#2


4  

I created this function to read in a list of company names and then get the top website result for each. It will get you started then you can adjust it as needed.

我创建了这个函数来读取公司名称列表,然后获得每个公司名称的*网站结果。它会让你开始,然后你可以根据需要调整它。

#libraries.
library(URLencode)
library(rvest)

#load data
d <-read.csv("P:\\needWebsites.csv")
c <- as.character(d$Company.Name)

# Function for getting website.
getWebsite <- function(name)
{
    url = URLencode(paste0("https://www.google.com/search?q=",name))

    page <- read_html(url)

    results <- page %>% 
      html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types.
      html_text()

    result <- results[1]

    return(as.character(result)) # Return results if you want to see them all.
}

# Apply the function to a list of company names.
websites <- data.frame(Website = sapply(c,getWebsite))]