This question already has an answer here:
这个问题已经有了答案:
- HTML agility pack - removing unwanted tags without removing content? 5 answers
- HTML敏捷包-删除不需要的标签而不删除内容?5个回答
Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?
有什么简单的方法可以从字符串中删除所有的HTML标签或任何与HTML相关的东西吗?
For example:
例如:
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )"
The above should really be:
以上应该是:
"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"
"Hulk Hogan's Celebrity Championship摔跤[Proj # 206010](真人秀)"
3 个解决方案
#1
150
You can use a simple regex like this:
您可以使用如下简单的regex:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of @mehaase)
请注意,这种解决方案有自己的缺陷。有关更多信息,请参见删除字符串中的HTML标记(特别是@mehaase的注释)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
另一个解决方案是使用HTML敏捷包。您可以在这里找到一个使用该库的示例:HTML敏捷包——在不删除内容的情况下删除不需要的标记?
#2
30
You can parse the string using Html Agility pack and get the InnerText.
您可以使用Html敏捷包解析字符串并获得InnerText。
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(@"<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )");
string result = htmlDoc.DocumentNode.InnerText;
#3
2
You can use the below code on your string and you will get the complete string without html part.
您可以在您的字符串上使用下面的代码,您将得到没有html部分的完整字符串。
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )".Replace(" ",string.Empty);
string s = Regex.Replace(title, "<.*?>", String.Empty);
#1
150
You can use a simple regex like this:
您可以使用如下简单的regex:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of @mehaase)
请注意,这种解决方案有自己的缺陷。有关更多信息,请参见删除字符串中的HTML标记(特别是@mehaase的注释)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
另一个解决方案是使用HTML敏捷包。您可以在这里找到一个使用该库的示例:HTML敏捷包——在不删除内容的情况下删除不需要的标记?
#2
30
You can parse the string using Html Agility pack and get the InnerText.
您可以使用Html敏捷包解析字符串并获得InnerText。
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(@"<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )");
string result = htmlDoc.DocumentNode.InnerText;
#3
2
You can use the below code on your string and you will get the complete string without html part.
您可以在您的字符串上使用下面的代码,您将得到没有html部分的完整字符串。
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )".Replace(" ",string.Empty);
string s = Regex.Replace(title, "<.*?>", String.Empty);