To pause code until a web page is fully loaded, I've been using the method below with great success almost all of the time.
要暂停代码,直到web页面完全加载,我几乎一直在使用下面的方法,而且几乎一直都很成功。
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
But occasionally, I see text content load after the method determines that the page is fully loaded, and so that content is not extracted.
但是,我偶尔会看到,在方法确定页面已被完全加载后,文本内容被加载,因此不会提取内容。
However, if I step through the code via F8, the content is extracted every time. This is done about as fast as I can press the F8 key repeatedly.
但是,如果我通过F8来遍历代码,每次都会提取内容。这是在我可以反复按F8键的情况下完成的。
So how can I check to ensure the page, and all its content, is fully loaded before the code continues to extract data?
那么,在代码继续提取数据之前,我如何检查以确保页面及其所有内容都已被完全加载?
In both cases, IE is running invisibly. However, I've tried this with IE visible and there is actually content in this specific location on the pages I'm working with.
在这两种情况下,IE都是隐形运行。但是,我已经尝试了IE,并且在我正在处理的页面上有这个特定位置的内容。
This is being done in Excel 2016, using VBA script. The specific content request is written like:
这是在Excel 2016中使用VBA脚本完成的。具体的内容请求如下:
'get item name from page and write it to the first cell on the first empty row available
Set itemName = objIE.document.querySelector(".the-item-name")
Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = itemName.innerText
I've read through Excel VBA: Wait for JavaScript execution in Internet Explorer because I think that maybe the values are getting added after the document is loaded, in an effort to prevent anyone from scraping data. However, I can't seem to identify any script that may be doing that. Doesn't mean it isn't there. I just can't see it yet.
我读过Excel VBA:在Internet Explorer中等待JavaScript执行,因为我认为在加载文档之后,可能会添加值,以防止任何人抓取数据。然而,我似乎找不到任何脚本可以做到这一点。并不意味着它不存在。我只是还看不见。
A specific example of the page with this issue is URL
这个问题页面的一个具体例子是URL
https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html
https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html
Initially product-total-price
div element contains dash (-), prior to the price being loaded, so that's what the request will return: - / each
instead of $11.29 / each
.
最初,product-total-price div元素包含dash(-),在加载价格之前,因此请求将返回:- / each,而不是11.29美元/每个。
I have a workaround, but it's not as efficient or as concise as I'd like it to be. I test the string returned for the presence of the dash. If it's there, loop and check it again, else capture it and insert it into the worksheet.
我有一个变通的办法,但它没有我想要的那么高效和简洁。我测试返回的字符串是否存在破折号。如果它在那里,循环并再次检查它,否则捕获它并将它插入到工作表中。
setPriceUM:
Set hdPriceUM = objIE.document.querySelector(".product-total-price").innerText
hdPriceUMString = hdPriceUM.innerText
stringTest = InStr(hdPriceUMString, "-")
If stringTest = True Then
GoTo setPriceUM
Else
Debug.Print hdPriceUMString
End If
Thank you for taking the time to read this and consider it.
感谢您花时间阅读并考虑它。
1 个解决方案
#1
2
Functionality of webpages is very different, so there is no solution that will fit to all of them.
网页的功能是非常不同的,所以没有适合所有的解决方案。
Regarding your example, your workaround is a working solution, the code might be like:
对于您的示例,您的工作区是一个工作解决方案,代码可能是:
Sub TestIE()
Dim q
With CreateObject("InternetExplorer.Application")
.Visible = True
.Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
' Wait IE
Do While .readyState < 4 Or .Busy
DoEvents
Loop
' Wait document
Do While .document.readyState <> "complete"
DoEvents
Loop
' Wait element
Do
q = .document.querySelector(".product-total-price").innerText
If Left(q, 1) <> "-" Then Exit Do
DoEvents
Loop
.Quit
End With
Debug.Print q
End Sub
Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):
无论如何,您需要使用浏览器开发人员工具(F12)查看网页加载过程、XHRs和DOM修改。通过这种方式,您可能会发现众多XHRs中的一个以JSON格式返回价格。它被登录到浏览器开发工具的网络标签,就在页面加载时价格出现之前。XHR是由一个已加载的JS创建的,特别是在页面加载事件之后。试试这个URL(我刚从网络标签上复制的):
https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en
So you may just reproduce that XHR and extract the price by splitting:
你可以复制XHR然后通过拆分提取价格
Sub TestXHR()
Dim q
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
.Send
q = .ResponseText
End With
q = Replace(q, " : ", ":")
q = Split(q, """displayPrice""", 2)(1)
q = Split(q, """formattedValue"":""", 2)(1)
q = Split(q, """", 2)(0)
Debug.Print q
End Sub
But again, there is no common case.
但同样,没有常见的情况。
You may also use JSON parser, take a look at some examples.
您也可以使用JSON解析器,请看一些示例。
#1
2
Functionality of webpages is very different, so there is no solution that will fit to all of them.
网页的功能是非常不同的,所以没有适合所有的解决方案。
Regarding your example, your workaround is a working solution, the code might be like:
对于您的示例,您的工作区是一个工作解决方案,代码可能是:
Sub TestIE()
Dim q
With CreateObject("InternetExplorer.Application")
.Visible = True
.Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
' Wait IE
Do While .readyState < 4 Or .Busy
DoEvents
Loop
' Wait document
Do While .document.readyState <> "complete"
DoEvents
Loop
' Wait element
Do
q = .document.querySelector(".product-total-price").innerText
If Left(q, 1) <> "-" Then Exit Do
DoEvents
Loop
.Quit
End With
Debug.Print q
End Sub
Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):
无论如何,您需要使用浏览器开发人员工具(F12)查看网页加载过程、XHRs和DOM修改。通过这种方式,您可能会发现众多XHRs中的一个以JSON格式返回价格。它被登录到浏览器开发工具的网络标签,就在页面加载时价格出现之前。XHR是由一个已加载的JS创建的,特别是在页面加载事件之后。试试这个URL(我刚从网络标签上复制的):
https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en
So you may just reproduce that XHR and extract the price by splitting:
你可以复制XHR然后通过拆分提取价格
Sub TestXHR()
Dim q
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
.Send
q = .ResponseText
End With
q = Replace(q, " : ", ":")
q = Split(q, """displayPrice""", 2)(1)
q = Split(q, """formattedValue"":""", 2)(1)
q = Split(q, """", 2)(0)
Debug.Print q
End Sub
But again, there is no common case.
但同样,没有常见的情况。
You may also use JSON parser, take a look at some examples.
您也可以使用JSON解析器,请看一些示例。