是否有一种简单的方法可以将HTML从Qt中的QString中剥离出来?

时间:2021-07-06 14:54:25

I have a QString with some HTML in it... is there an easy way to strip the HTML from it? I basically want just the actual text content.

我有一个QString里面有一些HTML…有没有一种简单的方法可以去掉HTML ?我只想要实际的文本内容。

<i>Test:</i><img src="blah.png" /><br> A test case

Would become:

将成为:

Test: A test case

I'm curious to know if Qt has a string function or utility for this.

我很想知道Qt是否有一个字符串函数或效用。

5 个解决方案

#1


10  

You may try to iterate through the string using QXmlStreamReader class and extract all text (if you HTML string is guarantied to be well formed XML).

您可以尝试使用QXmlStreamReader类遍历字符串并提取所有文本(如果HTML字符串被保护为格式良好的XML)。

Something like this:

是这样的:

QXmlStreamReader xml(htmlString);
QString textString;
while (!xml.atEnd()) {
    if ( xml.readNext() == QXmlStreamReader::Characters ) {
        textString += xml.text();
    }
}

but I'm unsure that its 100% valid ussage of QXmlStreamReader API since I've used it quite longe time ago and may forget something.

但是我不确定QXmlStreamReader API的100%有效的ussage,因为我很久以前就使用过它了,可能会忘记一些东西。

#2


35  

QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case";
s.remove(QRegExp("<[^>]*>"));
// s == "Test: A test case"

#3


23  

If you don't care about performance that much then QTextDocument does a pretty good job of converting HTML to plain text.

如果您不太关心性能,那么QTextDocument可以很好地将HTML转换为纯文本。

QTextDocument doc;
doc.setHtml( htmlString );

return doc.toPlainText();

I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.

我知道这个问题已经过时了,但我正在寻找一种快速且肮脏的方法来处理不正确的HTML。XML解析器没有给出好的结果。

#4


0  

the situation that some html is not quite validate xml make it worse to work it out correctly.

有些html不能很好地验证xml,这使得正确地处理xml变得更糟。

If it's valid xml (or not too bad formated), I think QXmlStreamReader + QXmlStreamEntityResolver might not be bad idea.

如果它是有效的xml(或者不是太糟糕),我认为QXmlStreamReader + QXmlStreamEntityResolver可能不是一个坏主意。

Sample code in: https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

样例代码:https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

(this can be a comment, but I still don't have permission to do so)

(这可能是一个评论,但我仍然没有这样做的许可)

#5


-3  

this answer is for who read this post later and using Qt5 or later. simply escape the html characters using inbuilt functions as below.

此答案适用于稍后阅读本文并使用Qt5或更高版本的用户。简单地使用内置函数转义html字符,如下所示。

QString str="<h1>some hedding </h1>"; // a string containing html tags.
QString esc=str.toHtmlEscaped(); //esc contains the html escaped srring.

#1


10  

You may try to iterate through the string using QXmlStreamReader class and extract all text (if you HTML string is guarantied to be well formed XML).

您可以尝试使用QXmlStreamReader类遍历字符串并提取所有文本(如果HTML字符串被保护为格式良好的XML)。

Something like this:

是这样的:

QXmlStreamReader xml(htmlString);
QString textString;
while (!xml.atEnd()) {
    if ( xml.readNext() == QXmlStreamReader::Characters ) {
        textString += xml.text();
    }
}

but I'm unsure that its 100% valid ussage of QXmlStreamReader API since I've used it quite longe time ago and may forget something.

但是我不确定QXmlStreamReader API的100%有效的ussage,因为我很久以前就使用过它了,可能会忘记一些东西。

#2


35  

QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case";
s.remove(QRegExp("<[^>]*>"));
// s == "Test: A test case"

#3


23  

If you don't care about performance that much then QTextDocument does a pretty good job of converting HTML to plain text.

如果您不太关心性能,那么QTextDocument可以很好地将HTML转换为纯文本。

QTextDocument doc;
doc.setHtml( htmlString );

return doc.toPlainText();

I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.

我知道这个问题已经过时了,但我正在寻找一种快速且肮脏的方法来处理不正确的HTML。XML解析器没有给出好的结果。

#4


0  

the situation that some html is not quite validate xml make it worse to work it out correctly.

有些html不能很好地验证xml,这使得正确地处理xml变得更糟。

If it's valid xml (or not too bad formated), I think QXmlStreamReader + QXmlStreamEntityResolver might not be bad idea.

如果它是有效的xml(或者不是太糟糕),我认为QXmlStreamReader + QXmlStreamEntityResolver可能不是一个坏主意。

Sample code in: https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

样例代码:https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

(this can be a comment, but I still don't have permission to do so)

(这可能是一个评论,但我仍然没有这样做的许可)

#5


-3  

this answer is for who read this post later and using Qt5 or later. simply escape the html characters using inbuilt functions as below.

此答案适用于稍后阅读本文并使用Qt5或更高版本的用户。简单地使用内置函数转义html字符,如下所示。

QString str="<h1>some hedding </h1>"; // a string containing html tags.
QString esc=str.toHtmlEscaped(); //esc contains the html escaped srring.