字符编码微软。XmlHttp改变

时间:2021-08-11 01:46:02

I'm writing a vbscript to pull some data from a webpage, strip out a few key pieces of information and write those to a file.

我正在编写一个vbscript来从一个网页中提取一些数据,去掉一些关键的信息,然后把它们写到一个文件中。

At the moment my script to access the pages and save the file contents to a string is this:

此时,我的脚本访问页面并将文件内容保存到一个字符串中:

Set WshShell = WScript.CreateObject("WScript.Shell")
Set http = CreateObject("Microsoft.XmlHttp")

'Load Webpage where address is URL
http.open "GET", URL, FALSE
http.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = http.responseText

I need to save the content to a string so I can use a regular expression on it to pull out the content that I need.

我需要将内容保存到一个字符串中,这样我就可以使用它的正则表达式来提取我需要的内容。

This script works perfectly, EXCEPT for when the pages contain non-standard characters (such as é). When the page contains something like this, the script throws up an error and stops.

这个脚本工作得很好,除了当页面包含非标准字符(例如e)时。当页面包含这样的内容时,脚本会抛出一个错误并停止。

I'm guessing this is something to do with the encoding, but I can't work out how to fix it. Can anyone point me in the right direction? Thanks guys

我猜这和编码有关,但我不知道怎么解决。有人能告诉我正确的方向吗?谢谢大家

Edit

编辑

Thanks to the help here I realised I've asked the wrong question! It turns out I was downloading the content fine - the problem was, afterwards I was trying to edit it and write it out to a file, and the file was in the wrong format. I had this:

多亏了这里的帮助,我才意识到我问错了问题!结果是我下载的内容很好——问题是,后来我试着编辑它并把它写到一个文件中,文件格式错误。我有这个:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True,)

Changing it to this:

改变:

Set objTextFile = objFSO.OpenTextFile(OutputFile, 8, True, -1)

Seems to have fixed it. What a crazy world, eh? Thanks for the help.

好像修好了。这世界真疯狂,是吧?谢谢你的帮助。

1 个解决方案

#1


2  

You may need to set the correct header blocks before send

您可能需要在发送之前设置正确的头块。

eg the following is an example only. You will need to find out what this is exactly for your website

下面就是一个例子。你需要知道你的网站到底是什么。

   http.open "GET", URL, FALSE
    http.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
    http.SetRequestHeader "Accept", "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
    http.SetRequestHeader "Accept-Language", "en-us,en;q=0.5"
    http.SetRequestHeader "Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
    http.send ""

EDIT:

编辑:

What about this instead. It works ok here

这个代替。在这里工作好

Dim XMLHttpReq,URL,WEBPAGE
Const Eacute  = "%C3%89"

Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")

URL = "http://en.wikipedia.org/wiki/%C3%89"
'Load Webpage where address is URL
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = XMLHttpReq.responseText
WEBPAGE = Replace(WEBPAGE, Eacute, "É")
'Debug.Print WEBPAGE

The E acute in this case returns as string %C3%89 and you can force it to whatever character you choose if required.

在这个例子中,E是作为字符串%C3%89返回的,如果需要,您可以强制它到您选择的任何字符。

EDIT2:

EDIT2:

Just to add, if you're doing this with VBScript you may find this method useful

如果你用VBScript来做这个,你会发现这个方法很有用。

Dim XMLHttpReq, URL, WEBPAGE, fso, f
Const Eacute = "%C3%89"
Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://en.wikipedia.org/wiki/%C3%89"
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
WEBPAGE = XMLHttpReq.responseText

Save2File WEBPAGE, "C:\Users\osknows\Desktop\test.txt"

Sub Save2File (sText, sFile)
    Dim oStream
    Set oStream = CreateObject("ADODB.Stream")
    With oStream
        .Open
        .CharSet = "utf-8"
        .WriteText sText
        .SaveToFile sFile, 2
    End With
    Set oStream = Nothing
End Sub

#1


2  

You may need to set the correct header blocks before send

您可能需要在发送之前设置正确的头块。

eg the following is an example only. You will need to find out what this is exactly for your website

下面就是一个例子。你需要知道你的网站到底是什么。

   http.open "GET", URL, FALSE
    http.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
    http.SetRequestHeader "Accept", "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
    http.SetRequestHeader "Accept-Language", "en-us,en;q=0.5"
    http.SetRequestHeader "Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
    http.send ""

EDIT:

编辑:

What about this instead. It works ok here

这个代替。在这里工作好

Dim XMLHttpReq,URL,WEBPAGE
Const Eacute  = "%C3%89"

Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")

URL = "http://en.wikipedia.org/wiki/%C3%89"
'Load Webpage where address is URL
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
'Assign webpage contents as a string to variable called Webpage
WEBPAGE = XMLHttpReq.responseText
WEBPAGE = Replace(WEBPAGE, Eacute, "É")
'Debug.Print WEBPAGE

The E acute in this case returns as string %C3%89 and you can force it to whatever character you choose if required.

在这个例子中,E是作为字符串%C3%89返回的,如果需要,您可以强制它到您选择的任何字符。

EDIT2:

EDIT2:

Just to add, if you're doing this with VBScript you may find this method useful

如果你用VBScript来做这个,你会发现这个方法很有用。

Dim XMLHttpReq, URL, WEBPAGE, fso, f
Const Eacute = "%C3%89"
Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
URL = "http://en.wikipedia.org/wiki/%C3%89"
XMLHttpReq.Open "GET", URL, False
XMLHttpReq.send ""
WEBPAGE = XMLHttpReq.responseText

Save2File WEBPAGE, "C:\Users\osknows\Desktop\test.txt"

Sub Save2File (sText, sFile)
    Dim oStream
    Set oStream = CreateObject("ADODB.Stream")
    With oStream
        .Open
        .CharSet = "utf-8"
        .WriteText sText
        .SaveToFile sFile, 2
    End With
    Set oStream = Nothing
End Sub