从字符串中删除无用的段落标记

时间:2022-08-27 17:12:12

If I have a string like:

如果我有一个像这样的字符串:

<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>
<p class="b">&nbsp;</p>
<p>blah blah blah this is some real content</p>
<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>

How can I turn it into just:

我怎样才能把它变成:

<p>blah blah blah this is some real content</p>

The regex needs to pick up &nbsp;s and spaces.

正则表达式需要选择s和空格。

3 个解决方案

#1


15  

$result = preg_replace('#<p[^>]*>(\s|&nbsp;?)*</p>#', '', $input);

This doesn't catch literal nbsp characters in the output, but that's very rare to see.

这不会捕获输出中的文字字符,但这种情况很少见。

Since you're dealing with HTML, if this is user-input I might suggest using HTML Purifier, which will also deal with XSS vulnerabilities. The configuration setting you want there to remove empty p tags is %AutoFormat.RemoveEmpty.

由于您正在处理HTML,如果这是用户输入,我可能会建议使用HTML Purifier,它也将处理XSS漏洞。您希望在那里删除空p标签的配置设置是%AutoFormat.RemoveEmpty。

#2


5  

This regex will work against your example:

这个正则表达式将违反您的示例:

<p[^>]*>(?:\s+|(?:&nbsp;)+|(?:<br\s*/?>)+)*</p>

#3


1  

As the original replier stated, regex isn't the best solution here, what you want is some sort of html stripper.

正如最初的回复者所说,正则表达式不是这里最好的解决方案,你想要的是某种html脱衣舞。

A function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

该网站上的一项功能:http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

Should help you out, you just need to use a bit of string manipulation to get the new lines and what not back to the format you want.

应该帮助你,你只需要使用一些字符串操作来获得新的行和什么不回到你想要的格式。

#1


15  

$result = preg_replace('#<p[^>]*>(\s|&nbsp;?)*</p>#', '', $input);

This doesn't catch literal nbsp characters in the output, but that's very rare to see.

这不会捕获输出中的文字字符,但这种情况很少见。

Since you're dealing with HTML, if this is user-input I might suggest using HTML Purifier, which will also deal with XSS vulnerabilities. The configuration setting you want there to remove empty p tags is %AutoFormat.RemoveEmpty.

由于您正在处理HTML,如果这是用户输入,我可能会建议使用HTML Purifier,它也将处理XSS漏洞。您希望在那里删除空p标签的配置设置是%AutoFormat.RemoveEmpty。

#2


5  

This regex will work against your example:

这个正则表达式将违反您的示例:

<p[^>]*>(?:\s+|(?:&nbsp;)+|(?:<br\s*/?>)+)*</p>

#3


1  

As the original replier stated, regex isn't the best solution here, what you want is some sort of html stripper.

正如最初的回复者所说,正则表达式不是这里最好的解决方案,你想要的是某种html脱衣舞。

A function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

该网站上的一项功能:http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

Should help you out, you just need to use a bit of string manipulation to get the new lines and what not back to the format you want.

应该帮助你,你只需要使用一些字符串操作来获得新的行和什么不回到你想要的格式。