如何使用PHP从HTML代码中删除冗余的标记?

I'm parsing some messy HTML code with PHP in which there are some redundant
tags and I would like to clean them up a bit. For instance:

我正在用PHP解析一些混乱的HTML代码，其中有一些冗余标记，我想把它们清理一下。例如:

<br>

<br /><br /> 


<br>

How would I replace something like that with this using preg_replace()?:

如何使用preg_replace()替换类似的东西?

<br /><br />

Newlines, spaces, and the differences between  ,  , and   would all have to be accounted for.

新行、空格以及
、
和
之间的差异都必须加以说明。

Edit: Basically I'd like to replace every instance of three or more successive breaks with just two.

编辑:基本上，我想用两个实例替换三个或更多连续的中断。

5 个解决方案

#1

Here is something you can use. The first line finds whenever there is 2 or more   tags (with whitespace between and different types) and replace them with wellformated  .

这是你可以用的东西。当有2个或2个以上
标记(在不同类型之间有空格)时，第一行查找，并将其替换为格式良好的

。

I also included the second line to clean up the rest of the   tags if you want that too.

如果您也想清除
标记的其余部分，我还包括第二行。

function clean($txt)
{
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
    return $txt;
}

#2

This should work, using minimum specifier:

这应该是可行的，使用最小说明符:

preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);

Should match appalling   constructions too.

应匹配骇人听闻的

结构。

#3

this will replace all breaks ... even if they're in uppercase:

这将取代所有的中断……即使他们是大写的:

preg_replace('/<br[^>]*>/i', '', $string);

#4

Try with:

试一试:

preg_replace('/<br\s*\/?>/', '', $inputString);

#5

Use str_replace, its much better for simple replacement, and you can also pass an array instead of a single search value.

使用str_replace，对于简单的替换来说更好，您还可以传递一个数组而不是一个搜索值。

$newcode = str_replace("<br>", "", $messycode);

#1