如何将IHTMLDocument2 ->get_body ->get_innerHTML转换为小写字符串?

时间:2022-11-20 22:30:08

I am trying to get the innerHTML from a webpage body on c++, I have this so far:

我正在尝试从c++的一个网页主体中获取innerHTML,到目前为止我有:

// I get "Document" from a parameter when calling this code
BSTR bstrContent = NULL;
IHTMLElement *p = 0;
Document->get_body( &p );

if( p )
{
    p->get_innerHTML( &bstrContent );
    p->Release();
}

Now I need to turn bstrContent into a lowercase std::string or LPSTR, I've tried this:

现在我需要将bstrContent转换为小写的std::string或LPSTR,我尝试过:

LPSTR pagecontent = NULL;

int responseLength = (int)wcslen(bstrContent);
pagecontent = new CHAR[ responseLength + 1 ];
wcstombs( pagecontent, bstrContent, responseLength);

But "pagecontent" does not always contain the full innerHTML, only a first chunk. I even if it worked, I don't know how to easily make it all lowercase, with a std::string I'd use "transform"+"tolower" to do it.

但是“pagecontent”并不总是包含完整的innerHTML,只包含第一个块。我甚至不知道如何用std:::string来轻松地把它写成小写,我会用“transform”+“tolower”来实现。

So, how can I turn bstrContent into a std::string?

那么,如何将bstrContent转换为std: string呢?

2 个解决方案

#1


0  

I'm not sure I fully understand your question. I don't know of any reason why get_innerHTML would give you an incomplete body, but you can convert a BSTR to a std::string (assuming you don't need to support unicode, in which case you should have been using a std::wstring anyway) using a function found on the following page:

我不太明白你的问题。我不知道为什么get_innerHTML会给您一个不完整的主体,但是您可以将BSTR转换为std::string(假设您不需要支持unicode,在这种情况下,您应该一直使用std:::wstring),使用以下页面中的函数:

http://www.codeguru.com/forum/showthread.php?t=275978

http://www.codeguru.com/forum/showthread.php?t=275978

If you're using ATL there is also the CA2W conversion utility, but the function I linked you to is better since it'll at least support UTF8 if relevant.

如果您正在使用ATL,也有CA2W转换实用程序,但是我链接到的函数更好,因为如果相关的话,它至少支持UTF8。

Hope that helps,

希望有所帮助,

  • Taxilian
  • Taxilian

#2


0  

std::transform works fine if you have a start-pointer and an end-pointer, too. It works on anything that behaves as sequence iterators (regular pointers qualify).

如果你有一个开始指针和一个结束指针,转换可以正常工作。它作用于任何行为为序列迭代器(常规指针限定)的东西。

#1


0  

I'm not sure I fully understand your question. I don't know of any reason why get_innerHTML would give you an incomplete body, but you can convert a BSTR to a std::string (assuming you don't need to support unicode, in which case you should have been using a std::wstring anyway) using a function found on the following page:

我不太明白你的问题。我不知道为什么get_innerHTML会给您一个不完整的主体,但是您可以将BSTR转换为std::string(假设您不需要支持unicode,在这种情况下,您应该一直使用std:::wstring),使用以下页面中的函数:

http://www.codeguru.com/forum/showthread.php?t=275978

http://www.codeguru.com/forum/showthread.php?t=275978

If you're using ATL there is also the CA2W conversion utility, but the function I linked you to is better since it'll at least support UTF8 if relevant.

如果您正在使用ATL,也有CA2W转换实用程序,但是我链接到的函数更好,因为如果相关的话,它至少支持UTF8。

Hope that helps,

希望有所帮助,

  • Taxilian
  • Taxilian

#2


0  

std::transform works fine if you have a start-pointer and an end-pointer, too. It works on anything that behaves as sequence iterators (regular pointers qualify).

如果你有一个开始指针和一个结束指针,转换可以正常工作。它作用于任何行为为序列迭代器(常规指针限定)的东西。