在字符串中删除未匹配的HTML标记。

时间:2022-08-27 17:03:50

Folks does anyone knows of a PHP function to remove unmatched HTML tags from a string. for example<div> This is a string <b> with an unmatched bold tag </div>. If there isnt one then help me buld one, maybe I can have a function that counts the number of opening tags and matching closing tags. If they are not even then remove the first opening tag or if closing tags are more, it removes the last tag?

大家都知道PHP函数可以从字符串中删除不匹配的HTML标记。例如

这是一个字符串 ,带有一个不匹配的粗体标签
。如果没有的话,我可以用一个函数来计算打开标签的数量和匹配的关闭标签。如果它们甚至没有删除第一个开始标记,或者如果关闭标记更多,它将删除最后一个标记?

2 个解决方案

#1


3  

I don't believe there is a function. What you want to do is use something like tidy, which PHP supports (PHP Tidy). Tidy will clean up your HTML for you. Also, please don't get the idea to fix this using regular expressions! ;)

我不相信有函数。您要做的是使用像tidy这样的东西,PHP支持(PHP tidy)。Tidy会为您清理HTML。另外,请不要想用正则表达式来解决这个问题!,)

Here is a tutorial from Zend that talks about tidying up your HTML:

这里是Zend的一个教程,它讲述了如何整理HTML:

#2


0  

Without adhering to some sort of rule structure, this isn't very feasible. If you want to follow standards (that is, don't have a </b> break out of a containing block), you can do a lookahead with regex to verify that </b> is found before </div> is found.

如果不遵循某种规则结构,这就不太可行。如果您希望遵循标准(即,不要从包含块中跳出),您可以使用regex进行前瞻性,以验证在找到之前已找到。

http://www.regular-expressions.info/lookaround.html

http://www.regular-expressions.info/lookaround.html

#1


3  

I don't believe there is a function. What you want to do is use something like tidy, which PHP supports (PHP Tidy). Tidy will clean up your HTML for you. Also, please don't get the idea to fix this using regular expressions! ;)

我不相信有函数。您要做的是使用像tidy这样的东西,PHP支持(PHP tidy)。Tidy会为您清理HTML。另外,请不要想用正则表达式来解决这个问题!,)

Here is a tutorial from Zend that talks about tidying up your HTML:

这里是Zend的一个教程,它讲述了如何整理HTML:

#2


0  

Without adhering to some sort of rule structure, this isn't very feasible. If you want to follow standards (that is, don't have a </b> break out of a containing block), you can do a lookahead with regex to verify that </b> is found before </div> is found.

如果不遵循某种规则结构,这就不太可行。如果您希望遵循标准(即,不要从包含块中跳出),您可以使用regex进行前瞻性,以验证在找到之前已找到。

http://www.regular-expressions.info/lookaround.html

http://www.regular-expressions.info/lookaround.html