I have some html string which can have an tag, like this:
我有一些可以有标签的html字符串,如下所示:
<p> blablabla <img> an image</img> again blablabla</p>
I want to remove the image tag, and get the part before and after in a string array.
我想删除图像标记,并在字符串数组中前后获取该部分。
edit: After calling
编辑:打电话后
String[] splitted = htmlStr.split("regex");
Result would be:
结果将是:
splitted[0] = "<p> blablabla ";
splitted[1] = "again blablabla</p>"
I'd say a regex is required, mind that img tag can be different from string to string: it can have one or more attributes for example.
我要说正则表达式是必需的,请注意img标签可以在不同的字符串之间:例如,它可以有一个或多个属性。
4 个解决方案
#1
0
Use StringTokenizer, String.split() or an HTML parser for complex HTMLs with many IMG tags.
对于包含许多IMG标记的复杂HTML,请使用StringTokenizer,String.split()或HTML解析器。
#2
0
If you want to remove all html-tags you can use this code:
如果要删除所有html标签,可以使用以下代码:
string = string.replaceAll("\\<.*?\\>", "");
#3
0
Try following code:
请尝试以下代码:
String str = "<p> blablabla <img> an image</img> again blablabla</p>";
int start = str.indexOf("<img");
int end = str.indexOf("</img>");
String imgTagValue = str.substring(0,start) + str.substring(end, str.length());
However, if in a single line more than <img>
tags are used, it should be parsed appropriately.
但是,如果在单行中使用多于标记,则应对其进行适当解析。
Refer here.
#4
0
You should use an HTML Parser
for parsing HTMLs
, because your tags may vary, which can't be handled completely by Regex
.
您应该使用HTML Parser来解析HTML,因为您的标记可能会有所不同,Regex无法完全处理。
But, given for this case that you just want to remove the <img>
tag, regardless of the attributes it has, you can use the below regex: -
但是,考虑到这种情况你只想删除标签,不管它有什么属性,你可以使用下面的正则表达式: -
String str = "<p> blablabla <img> an image</img> again <img href = sadf> " +
"asdf asdf </img>blablabla</p>";
str = str.replaceAll("<img\\s*[^>]*?>[^<]*?</img>", "");
System.out.println(str);
OUTPUT: -
<p> blablabla again blablabla</p>
You would like to see the below link: -
您想看到以下链接: -
- Why shouldn't you parse HTML with Regexp
- The true power of Regular Expression - Do go through it.
为什么不用Regexp解析HTML
正则表达式的真正力量 - 通过它。
You can rather use HTML parsers like: -
您可以使用以下HTML解析器: -
#1
0
Use StringTokenizer, String.split() or an HTML parser for complex HTMLs with many IMG tags.
对于包含许多IMG标记的复杂HTML,请使用StringTokenizer,String.split()或HTML解析器。
#2
0
If you want to remove all html-tags you can use this code:
如果要删除所有html标签,可以使用以下代码:
string = string.replaceAll("\\<.*?\\>", "");
#3
0
Try following code:
请尝试以下代码:
String str = "<p> blablabla <img> an image</img> again blablabla</p>";
int start = str.indexOf("<img");
int end = str.indexOf("</img>");
String imgTagValue = str.substring(0,start) + str.substring(end, str.length());
However, if in a single line more than <img>
tags are used, it should be parsed appropriately.
但是,如果在单行中使用多于标记,则应对其进行适当解析。
Refer here.
#4
0
You should use an HTML Parser
for parsing HTMLs
, because your tags may vary, which can't be handled completely by Regex
.
您应该使用HTML Parser来解析HTML,因为您的标记可能会有所不同,Regex无法完全处理。
But, given for this case that you just want to remove the <img>
tag, regardless of the attributes it has, you can use the below regex: -
但是,考虑到这种情况你只想删除标签,不管它有什么属性,你可以使用下面的正则表达式: -
String str = "<p> blablabla <img> an image</img> again <img href = sadf> " +
"asdf asdf </img>blablabla</p>";
str = str.replaceAll("<img\\s*[^>]*?>[^<]*?</img>", "");
System.out.println(str);
OUTPUT: -
<p> blablabla again blablabla</p>
You would like to see the below link: -
您想看到以下链接: -
- Why shouldn't you parse HTML with Regexp
- The true power of Regular Expression - Do go through it.
为什么不用Regexp解析HTML
正则表达式的真正力量 - 通过它。
You can rather use HTML parsers like: -
您可以使用以下HTML解析器: -