将html字符串分为两部分:在标记之前和之后

时间:2022-05-08 22:04:04

I have some html string which can have an tag, like this:

我有一些可以有标签的html字符串,如下所示:

<p> blablabla <img> an image</img> again blablabla</p>

I want to remove the image tag, and get the part before and after in a string array.

我想删除图像标记,并在字符串数组中前后获取该部分。

edit: After calling

编辑:打电话后

String[] splitted = htmlStr.split("regex");

Result would be:

结果将是:

splitted[0] = "<p> blablabla ";
splitted[1] = "again blablabla</p>"

I'd say a regex is required, mind that img tag can be different from string to string: it can have one or more attributes for example.

我要说正则表达式是必需的,请注意img标签可以在不同的字符串之间:例如,它可以有一个或多个属性。

4 个解决方案

#1


0  

Use StringTokenizer, String.split() or an HTML parser for complex HTMLs with many IMG tags.

对于包含许多IMG标记的复杂HTML,请使用StringTokenizer,String.split()或HTML解析器。

#2


0  

If you want to remove all html-tags you can use this code:

如果要删除所有html标签,可以使用以下代码:

string = string.replaceAll("\\<.*?\\>", "");

#3


0  

Try following code:

请尝试以下代码:

String str = "<p> blablabla <img> an image</img> again blablabla</p>";
int start = str.indexOf("<img");
int end = str.indexOf("</img>");
String imgTagValue = str.substring(0,start) + str.substring(end, str.length());

However, if in a single line more than <img> tags are used, it should be parsed appropriately.

但是,如果在单行中使用多于将html字符串分为两部分:在标记之前和之后标记,则应对其进行适当解析。

Refer here.

#4


0  

You should use an HTML Parser for parsing HTMLs, because your tags may vary, which can't be handled completely by Regex.

您应该使用HTML Parser来解析HTML,因为您的标记可能会有所不同,Regex无法完全处理。

But, given for this case that you just want to remove the <img> tag, regardless of the attributes it has, you can use the below regex: -

但是,考虑到这种情况你只想删除将html字符串分为两部分:在标记之前和之后标签,不管它有什么属性,你可以使用下面的正则表达式: -

String str = "<p> blablabla <img> an image</img> again <img href = sadf> " + 
             "asdf asdf </img>blablabla</p>";

str = str.replaceAll("<img\\s*[^>]*?>[^<]*?</img>", "");
System.out.println(str);

OUTPUT: -

<p> blablabla  again blablabla</p>

You would like to see the below link: -

您想看到以下链接: -

You can rather use HTML parsers like: -

您可以使用以下HTML解析器: -

#1


0  

Use StringTokenizer, String.split() or an HTML parser for complex HTMLs with many IMG tags.

对于包含许多IMG标记的复杂HTML,请使用StringTokenizer,String.split()或HTML解析器。

#2


0  

If you want to remove all html-tags you can use this code:

如果要删除所有html标签,可以使用以下代码:

string = string.replaceAll("\\<.*?\\>", "");

#3


0  

Try following code:

请尝试以下代码:

String str = "<p> blablabla <img> an image</img> again blablabla</p>";
int start = str.indexOf("<img");
int end = str.indexOf("</img>");
String imgTagValue = str.substring(0,start) + str.substring(end, str.length());

However, if in a single line more than <img> tags are used, it should be parsed appropriately.

但是,如果在单行中使用多于将html字符串分为两部分:在标记之前和之后标记,则应对其进行适当解析。

Refer here.

#4


0  

You should use an HTML Parser for parsing HTMLs, because your tags may vary, which can't be handled completely by Regex.

您应该使用HTML Parser来解析HTML,因为您的标记可能会有所不同,Regex无法完全处理。

But, given for this case that you just want to remove the <img> tag, regardless of the attributes it has, you can use the below regex: -

但是,考虑到这种情况你只想删除将html字符串分为两部分:在标记之前和之后标签,不管它有什么属性,你可以使用下面的正则表达式: -

String str = "<p> blablabla <img> an image</img> again <img href = sadf> " + 
             "asdf asdf </img>blablabla</p>";

str = str.replaceAll("<img\\s*[^>]*?>[^<]*?</img>", "");
System.out.println(str);

OUTPUT: -

<p> blablabla  again blablabla</p>

You would like to see the below link: -

您想看到以下链接: -

You can rather use HTML parsers like: -

您可以使用以下HTML解析器: -