I am trying to get the innerHTML from a webpage body on c++, I have this so far:
我正在尝试从c++的一个网页主体中获取innerHTML,到目前为止我有:
// I get "Document" from a parameter when calling this code
BSTR bstrContent = NULL;
IHTMLElement *p = 0;
Document->get_body( &p );
if( p )
{
p->get_innerHTML( &bstrContent );
p->Release();
}
Now I need to turn bstrContent into a lowercase std::string or LPSTR, I've tried this:
现在我需要将bstrContent转换为小写的std::string或LPSTR,我尝试过:
LPSTR pagecontent = NULL;
int responseLength = (int)wcslen(bstrContent);
pagecontent = new CHAR[ responseLength + 1 ];
wcstombs( pagecontent, bstrContent, responseLength);
But "pagecontent" does not always contain the full innerHTML, only a first chunk. I even if it worked, I don't know how to easily make it all lowercase, with a std::string I'd use "transform"+"tolower" to do it.
但是“pagecontent”并不总是包含完整的innerHTML,只包含第一个块。我甚至不知道如何用std:::string来轻松地把它写成小写,我会用“transform”+“tolower”来实现。
So, how can I turn bstrContent into a std::string?
那么,如何将bstrContent转换为std: string呢?
2 个解决方案
#1
0
I'm not sure I fully understand your question. I don't know of any reason why get_innerHTML would give you an incomplete body, but you can convert a BSTR to a std::string (assuming you don't need to support unicode, in which case you should have been using a std::wstring anyway) using a function found on the following page:
我不太明白你的问题。我不知道为什么get_innerHTML会给您一个不完整的主体,但是您可以将BSTR转换为std::string(假设您不需要支持unicode,在这种情况下,您应该一直使用std:::wstring),使用以下页面中的函数:
http://www.codeguru.com/forum/showthread.php?t=275978
http://www.codeguru.com/forum/showthread.php?t=275978
If you're using ATL there is also the CA2W conversion utility, but the function I linked you to is better since it'll at least support UTF8 if relevant.
如果您正在使用ATL,也有CA2W转换实用程序,但是我链接到的函数更好,因为如果相关的话,它至少支持UTF8。
Hope that helps,
希望有所帮助,
- Taxilian
- Taxilian
#2
0
std::transform works fine if you have a start-pointer and an end-pointer, too. It works on anything that behaves as sequence iterators (regular pointers qualify).
如果你有一个开始指针和一个结束指针,转换可以正常工作。它作用于任何行为为序列迭代器(常规指针限定)的东西。
#1
0
I'm not sure I fully understand your question. I don't know of any reason why get_innerHTML would give you an incomplete body, but you can convert a BSTR to a std::string (assuming you don't need to support unicode, in which case you should have been using a std::wstring anyway) using a function found on the following page:
我不太明白你的问题。我不知道为什么get_innerHTML会给您一个不完整的主体,但是您可以将BSTR转换为std::string(假设您不需要支持unicode,在这种情况下,您应该一直使用std:::wstring),使用以下页面中的函数:
http://www.codeguru.com/forum/showthread.php?t=275978
http://www.codeguru.com/forum/showthread.php?t=275978
If you're using ATL there is also the CA2W conversion utility, but the function I linked you to is better since it'll at least support UTF8 if relevant.
如果您正在使用ATL,也有CA2W转换实用程序,但是我链接到的函数更好,因为如果相关的话,它至少支持UTF8。
Hope that helps,
希望有所帮助,
- Taxilian
- Taxilian
#2
0
std::transform works fine if you have a start-pointer and an end-pointer, too. It works on anything that behaves as sequence iterators (regular pointers qualify).
如果你有一个开始指针和一个结束指针,转换可以正常工作。它作用于任何行为为序列迭代器(常规指针限定)的东西。