Excel VBA / HTML从下拉菜单中点击下一页

时间:2022-11-19 21:33:37

I'm writing my first data scraper using Excel and VBA. I'm stuck trying to go to the next page of a website. The source code looks as follows:

我正在使用Excel和VBA编写我的第一个数据抓取器。我正试图浏览一个网站的下一页。源代码如下:

<li><a href="#" onclick="changePage(2); return false;">Page 2 of 24</a></li>

This is the VBA code I have but does not seem to work:

这是我有的VBA代码,但似乎没有效果:

For Each l In ie.Document.getElementsByTagName("a")
    If l.href = "#" And l.onclick = "changePage(2); return false;" Then
        l.Item(2).Click
        Exit For
    End If
Next l

When I run the code I don't get any errors, but it doesn't seem to go to page 2. Keep in mind that there are more pages after page 2. My idea is replace "2" with a variable later and increase that variable by one. But I need to get it to work first.

当我运行代码时,我不会得到任何错误,但它似乎不会转到第2页。请记住,在第二页之后还有更多的页面。我的想法是稍后用一个变量替换“2”,并将这个变量增加1。但我得先把它修好。

Thanks to whoever can help.

感谢任何能帮忙的人。

3 个解决方案

#1


2  

[Edit: I now have a solution and the code has been replaced. -RDH]

我现在有一个解决方案,代码已经被替换。-RDH]

First I want to mention that if the data retrieved in this manner is used for commercial purposes or anything other than personal use then it violates 2 sections of the Kelley Blue Book (kbb.com) Terms of Service.

首先,我想指出的是,如果以这种方式检索的数据被用于商业目的或除个人用途以外的任何用途,那么它违反了Kelley Blue Book (kbb.com)服务条款的两个部分。

FYI: Sites that collect, update, and maintain data like BlueBook or the MLS take their data very seriously, and they don't like people scraping it. I was speaking to an old classmate of mine who has her degree in Computer Science and is now a real estate agent, and I mentioned to her about how cool it is to be able scrape housing data off of MLS and she nearly flipped out on me. Just saying: people were paid to create that data and people make their lives using that data. 'Nuff said. I was able to get the problem code running by creating a web page on my own server that had the same format you were looking for since I get a different version of the bluebook.com site since I am in Canada. I get redirected to kbb.com.

供大家参考:像BlueBook或MLS这样收集、更新和维护数据的网站会非常认真地对待它们的数据,而且它们不喜欢人们去抓取数据。我跟我的一个老同学谈过,她有计算机科学的学位,现在是一名房地产经纪人。我跟她说,能够从MLS上搜集住房数据是多么的酷,她差点就对我发火。我想说的是:人们为创造这些数据而付费,人们靠这些数据谋生。足够地说。我可以通过在我自己的服务器上创建一个web页面来运行问题代码,因为我在加拿大的时候得到了一个不同版本的bluebook.com网站。我被重定向到kbb.com。

+++ The real problem +++

真正的问题

The problem is that hrefs with an # symbol are actually the full URL with the # attached to the end, and when you check the onClick event it actually contains the full function declariation, so you have to only search for partial strings.

问题是,带有#符号的hrefs实际上是带有#附加到末尾的完整URL,当您检查onClick事件时,它实际上包含完整的函数声明,因此您只能搜索部分字符串。

' A good idea to declare the proper datatypes
' because IHTMLElement has the click event but IHTMLAnchorElements don't
Dim l As IHTMLElement
Dim htmlanchors As IHTMLElementCollection
' ...

Set htmlanchors = ie.Document.getElementsByTagName("a")

' Look through all the anchor tags on the page
    For Each l In htmlanchors
       ' Check to see the Href contains a # and the onclick event has specific code
        If InStr(l.href, "#") And InStr(l.onclick, "changePage(3); return false;") Then
            ' Click the current anchor link
            l.Click
            Exit For
        End If
Next l

#2


0  

Have you tried

你有试过

.FireEvent ("onclick")
Or
.FireEvent ("onmouseover")
.FireEvent ("onmousedown")
.FireEvent("onmouseup")

in place of .click? Sometimes the JavaScript actions don't respond to .click.

代替.click吗?有时JavaScript操作没有响应。

#3


0  

Rick – below is my entire code. I’m basically trying to scrape www.the bluebook.com.

下面是我的全部代码。我基本上是想把www。bluebook.com刮下来。

Sub ScrapeData()

Dim ie As InternetExplorer
Dim ele As Object
Dim RowCount As Long
Dim myWebsite As String, mySearch1 As String, mySearch2 As String, mySearch3 As String
Dim Document As HTMLDocument

myWebsite = Range("Website").Value
mySearch1 = Range("search1").Value
mySearch2 = Range("search2").Value
mySearch3 = Range("search3").Value

Set mySheet = Sheets("Sheet1")
Range("A6").Value = "Company"
Range("B6").Value = "Address"
Range("C6").Value = "Contact"

RowCount = 7
Set ie = New InternetExplorer
ie.Visible = True
With ie
.Visible = True
.navigate (myWebsite)

Do While .Busy Or .readyState <> 4
    DoEvents
Loop

ie.Document.getElementById("search").Value = mySearch1
ie.Document.getElementById("selRegion").Value = mySearch2
ie.Document.getElementsByClassName("searchBtn")(0).Click

Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

For Each ele In .Document.all
    Select Case ele.className
    Case "result_title"
    RowCount = RowCount + 1
    Case "cname"
    mySheet.Range("A" & RowCount) = ele.innerText
    Case "addy_wrapper"
    mySheet.Range("B" & RowCount) = ele.innerText
    End Select
Next ele
End With

'THIS IS THE CODE THAT IS NOT WORKING
For Each l In ie.Document.getElementsByTagName("a")
    If l.href = "#" And l.onclick = "changePage(3); return false;" Then
        l.Item(3).Click
        Exit For
    End If
Next l

Set ie = Nothing
End Sub

#1


2  

[Edit: I now have a solution and the code has been replaced. -RDH]

我现在有一个解决方案,代码已经被替换。-RDH]

First I want to mention that if the data retrieved in this manner is used for commercial purposes or anything other than personal use then it violates 2 sections of the Kelley Blue Book (kbb.com) Terms of Service.

首先,我想指出的是,如果以这种方式检索的数据被用于商业目的或除个人用途以外的任何用途,那么它违反了Kelley Blue Book (kbb.com)服务条款的两个部分。

FYI: Sites that collect, update, and maintain data like BlueBook or the MLS take their data very seriously, and they don't like people scraping it. I was speaking to an old classmate of mine who has her degree in Computer Science and is now a real estate agent, and I mentioned to her about how cool it is to be able scrape housing data off of MLS and she nearly flipped out on me. Just saying: people were paid to create that data and people make their lives using that data. 'Nuff said. I was able to get the problem code running by creating a web page on my own server that had the same format you were looking for since I get a different version of the bluebook.com site since I am in Canada. I get redirected to kbb.com.

供大家参考:像BlueBook或MLS这样收集、更新和维护数据的网站会非常认真地对待它们的数据,而且它们不喜欢人们去抓取数据。我跟我的一个老同学谈过,她有计算机科学的学位,现在是一名房地产经纪人。我跟她说,能够从MLS上搜集住房数据是多么的酷,她差点就对我发火。我想说的是:人们为创造这些数据而付费,人们靠这些数据谋生。足够地说。我可以通过在我自己的服务器上创建一个web页面来运行问题代码,因为我在加拿大的时候得到了一个不同版本的bluebook.com网站。我被重定向到kbb.com。

+++ The real problem +++

真正的问题

The problem is that hrefs with an # symbol are actually the full URL with the # attached to the end, and when you check the onClick event it actually contains the full function declariation, so you have to only search for partial strings.

问题是,带有#符号的hrefs实际上是带有#附加到末尾的完整URL,当您检查onClick事件时,它实际上包含完整的函数声明,因此您只能搜索部分字符串。

' A good idea to declare the proper datatypes
' because IHTMLElement has the click event but IHTMLAnchorElements don't
Dim l As IHTMLElement
Dim htmlanchors As IHTMLElementCollection
' ...

Set htmlanchors = ie.Document.getElementsByTagName("a")

' Look through all the anchor tags on the page
    For Each l In htmlanchors
       ' Check to see the Href contains a # and the onclick event has specific code
        If InStr(l.href, "#") And InStr(l.onclick, "changePage(3); return false;") Then
            ' Click the current anchor link
            l.Click
            Exit For
        End If
Next l

#2


0  

Have you tried

你有试过

.FireEvent ("onclick")
Or
.FireEvent ("onmouseover")
.FireEvent ("onmousedown")
.FireEvent("onmouseup")

in place of .click? Sometimes the JavaScript actions don't respond to .click.

代替.click吗?有时JavaScript操作没有响应。

#3


0  

Rick – below is my entire code. I’m basically trying to scrape www.the bluebook.com.

下面是我的全部代码。我基本上是想把www。bluebook.com刮下来。

Sub ScrapeData()

Dim ie As InternetExplorer
Dim ele As Object
Dim RowCount As Long
Dim myWebsite As String, mySearch1 As String, mySearch2 As String, mySearch3 As String
Dim Document As HTMLDocument

myWebsite = Range("Website").Value
mySearch1 = Range("search1").Value
mySearch2 = Range("search2").Value
mySearch3 = Range("search3").Value

Set mySheet = Sheets("Sheet1")
Range("A6").Value = "Company"
Range("B6").Value = "Address"
Range("C6").Value = "Contact"

RowCount = 7
Set ie = New InternetExplorer
ie.Visible = True
With ie
.Visible = True
.navigate (myWebsite)

Do While .Busy Or .readyState <> 4
    DoEvents
Loop

ie.Document.getElementById("search").Value = mySearch1
ie.Document.getElementById("selRegion").Value = mySearch2
ie.Document.getElementsByClassName("searchBtn")(0).Click

Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

For Each ele In .Document.all
    Select Case ele.className
    Case "result_title"
    RowCount = RowCount + 1
    Case "cname"
    mySheet.Range("A" & RowCount) = ele.innerText
    Case "addy_wrapper"
    mySheet.Range("B" & RowCount) = ele.innerText
    End Select
Next ele
End With

'THIS IS THE CODE THAT IS NOT WORKING
For Each l In ie.Document.getElementsByTagName("a")
    If l.href = "#" And l.onclick = "changePage(3); return false;" Then
        l.Item(3).Click
        Exit For
    End If
Next l

Set ie = Nothing
End Sub