如何组合这两种正则表达式模式？

I'm feeling pretty silly having to ask this, but I cannot get this to work to save my life...

我不得不问这个问题感觉很傻,但我不能让这个来拯救我的生命......

What Works

preg_replace( '/(<[^>]+) onmouseout=".*?"/i', '$1', preg_replace( '/(<[^>]+) onmouseover=".*?"/i', '$1', $strHtml ) )

How can I combine these two preg_replace functions into one (by combing the two regex patterns?

如何将这两个preg_replace函数合并为一个(通过梳理两个正则表达式模式?

My Attempt to Cleanup (Doesn't Work)

我的尝试清理(不起作用)

preg_replace( '/(<[^>]+) (onmouseover|onmouseout)=".*?"/i', '$1', $strHtml )

I want this preg_replace() function to remove all onmouseover AND onmouseout attributes from my HTML string. It appears to remove only one of the two attributes... What am I doing wrong?

我希望这个preg_replace()函数从我的HTML字符串中删除所有onmouseover AND onmouseout属性。它似乎只删除了两个属性中的一个......我做错了什么?

UPDATE: Example String

<p><img src="http://www.bestlinknetware.com/products/204233spc.jpg" width="680" height="365"><br>   <a href="http://www.bestlinknetware.com/products/204233INST.pdf" target="_blank" onmouseover="MM_swapImage('Image2','','/Content/bimages/ins2.gif',1)" onmouseout="MM_swapImgRestore()"><img name="Image2" border="0" src="http://www.bestlinknetware.com/Content/bimages/ins1.gif"></a> </p> <p><strong>No contract / No subscription / No monthy fee<br> 1080p HDTV reception<br> 32db high gain reception<br> Rotor let you change direction of the antenna to find best reception</strong></p>  <a href=http://transition.fcc.gov/mb/engineering/dtvmaps/  target="blank"><strong>CLICK HERE</strong></a><br>to see HDTV channels available in your area.<br> <br/> ** TV signal reception is immensely affected by the conditions such as antenna height, terrain, distance from broadcasting transmission antenna and output power of transmitter. Channels you can watch may vary depending on these conditions. <br> <br/> <br/> <p>* Reception: VHF/UHF/FM<br/>   * Reception range: 120miles<br/>   * Built-in 360 degree motor rotor<br>   * Wireless remote controller for rotor (included)<br/>   * Dual TV Outputs<br>   * Easy Installation<br>   * High Sensitivity Reception<br>   * Built-in Super Low Noise Amplifier<br>   * Power : AC15V 300mA<br> <br/> Kit contents<br/> * One - HDTV Yagi antenna with built-in roter & amplifier<br/> * One - Roter control box<br/> * One - Remote for roter control box<br/> * One - 40Ft coax cable<br/> * One - 4Ft coax cable<br/> * One - power supply for roter control box</p>

UPDATE: Tool for Future Views of This Thread

https://regex101.com/

I could never figure out exactly how to use http://regexr.com/, so I tried this regex101.com site, and I have been loving it ever since. Highly recommended for anyone facing similar issues (that used a cut-and-paste regex pattern like I did originally...).

我永远无法弄清楚如何使用http://regexr.com/,所以我尝试了这个regex101.com网站,从那以后我一直很喜欢它。强烈建议面对类似问题的人(使用像我原来那样的剪切和粘贴正则表达式模式......)。

1 个解决方案

#1

The problem with your original expression was that the initial group was grabbing too much and so the only one of the two being replaced was the one appearing last on the line. That happened because of the greedy [^>]+ repetition that ate up a larger portion of the search string than you were anticipating, capturing everything from the beginning of the first desired match through to the start second attribute you wanted to get rid of. And then having the pattern anchor to the starting bracket of an html tag would also prevent multiple matches within the element even after addressing that issue.

原始表达式的问题在于初始组占用太多,因此被替换的两个中唯一一个是出现在最后一行的那个。之所以发生这种情况,是因为贪婪的[^>] +重复会占用搜索字符串的大部分而不是预期,捕获从第一个所需匹配开始到你想要摆脱的开始第二个属性的所有内容。然后将模式锚定到html标记的起始括号也会阻止元素内的多个匹配,即使在解决该问题之后也是如此。

If you want to do this in one call to preg_replace() then rather than trying to grab the text that you want to keep it makes more sense to look for text to remove (by substitution with an empty string):

如果你想在一次调用preg_replace()时这样做,那么不是试图抓住你想要保留的文本,而是寻找要删除的文本(通过用空字符串替换)更有意义:

preg_replace( '/(onmouseover|onmouseout)=".*?"/i', '', $strHtml )

You already had a non-greedy match on the attribute value (with the .*?) and based on your prior code it appears to have been working well for you already. Note that this particular expression doesn't cover all the possible variations in an HTML/XML document (whitespace and quote marks, for example.) I trust that you can make a judgment call regarding whether this is generic enough for your needs.

你已经对属性值(使用。*?)进行了非贪婪的匹配,并且根据您之前的代码,它似乎已经为您提供了良好的效果。请注意,此特定表达式并未涵盖HTML / XML文档中的所有可能变体(例如,空白和引号)。我相信您可以根据自己的需要判断是否足够通用。

#1

如果你想在一次调用preg_replace()时这样做,那么不是试图抓住你想要保留的文本,而是寻找要删除的文本(通过用空字符串替换)更有意义:

preg_replace( '/(onmouseover|onmouseout)=".*?"/i', '', $strHtml )

秒客网

如何组合这两种正则表达式模式？

UPDATE: Example String

UPDATE: Tool for Future Views of This Thread

1 个解决方案

#1

#1

相关文章