After loosing much sleep I still cannot figure this out:
在失去多少睡眠后,我仍然无法弄明白:
The code below (its a simplification from larger code that shows only the problem) Identifies Item1 and Item2 on FF but does not on IE7. I'm clueless.
下面的代码(它是大型代码的简化,仅显示问题)标识FF上的Item1和Item2,但不在IE7上。我很无能为力。
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body>
<table><tr>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item1</font></td>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item2</font></td>
</tr></table>
<script type="text/javascript">
var _pattern =/trash.*?<font.*?>(.*)<\/font>/gim;
alert (_pattern);
var thtml = document.documentElement.innerHTML;
alert (thtml);
while ( _match =_pattern.exec(thtml)){
alert (_match[1]);
}
</script>
</body>
</html>
Notes: 1. I know there are better ways to get Item1 and Item2. this example is for showing the Regex problem I'm facing in the simplest way. 2. When I remove the table and /table tags it works.
注意:1。我知道有更好的方法可以获得Item1和Item2。这个例子用于以最简单的方式显示我面临的正则表达式问题。 2.当我删除表和/ table标签时,它可以工作。
Thanks in advance
提前致谢
4 个解决方案
#1
The problem is that JScripts multiline implementation is buggy. It doesn't allow the any char . to match a newline character.
问题是JScripts多行实现是错误的。它不允许任何字符。匹配换行符。
Use this regex instead:-
改为使用这个正则表达式: -
var _pattern = /trash[\s\S]*?<font[^>]*>([^<]*)<\/font>/gi;
This eliminates . altogether, note [\s\S] is equivalent but will match a new line.
这消除了。总而言之,注意[\ s \ S]是等效的,但会匹配一个新行。
The reason why removing table changes things is the IE's .innerHTML implementation doesn't rely on original markup received. Instead the markup is created dynamically by examining the DOM. When it sees a table element it places newlines in the output in different places to than when table is missing.
删除表更改内容的原因是IE的.innerHTML实现不依赖于收到的原始标记。而是通过检查DOM动态创建标记。当它看到一个表元素时,它会将输出中的换行符放在不同位置的输出中,而不是缺少表时。
#2
Seriously this is horrible. A solution based on getElementById / getElementsByTagName will be considerably more reliable and flexible.
说真的,这太可怕了。基于getElementById / getElementsByTagName的解决方案将更加可靠和灵活。
As for the actual problem it's probably because javascript multiline regex support is not x-browser safe and IE in particular has problems. Removing the table declaration will probably force IE to internally format the remaining markup to a single line (=success) where adding it back in will make IE add carriage returns etc (=fail).
至于实际问题,可能是因为javascript多行正则表达式支持不是x浏览器安全,IE尤其有问题。删除表声明可能会强制IE在内部将剩余标记格式化为单行(=成功),将其重新添加将使IE添加回车等(=失败)。
I know that you did say you know there are better ways, but you didn't explain why you'd persist with this. Relying on regex and further relying on IE plaintext interpretation of a DOM is going to get you into problems like this. Don't do it.
我知道你确实说你知道有更好的方法,但你没有解释为什么你坚持这个。依赖正则表达式并进一步依赖于IE的明文解释DOM会让你遇到这样的问题。不要这样做。
#3
The ending td tags have a character that needs to be escaped: the / slash. I don't know if that is why IE7 is tripping. Safari is okay as tested.
结尾的td标签有一个需要转义的字符:/ slash。我不知道这是不是IE7绊倒的原因。 Safari已经过测试了。
You might want to consider adding an id to the table. Then just iterate through the childNodes of the table only. You would go through a whole lot less HTML on a bigger page and probably conserve memory, too.
您可能需要考虑在表中添加id。然后只迭代表的childNodes。你可以在更大的页面上浏览更少的HTML,也可以节省内存。
#4
Try to build your regexp with new RegExp("", "gim")
. It's more portable.
尝试用新的RegExp(“”,“gim”)构建你的正则表达式。它更便携。
#1
The problem is that JScripts multiline implementation is buggy. It doesn't allow the any char . to match a newline character.
问题是JScripts多行实现是错误的。它不允许任何字符。匹配换行符。
Use this regex instead:-
改为使用这个正则表达式: -
var _pattern = /trash[\s\S]*?<font[^>]*>([^<]*)<\/font>/gi;
This eliminates . altogether, note [\s\S] is equivalent but will match a new line.
这消除了。总而言之,注意[\ s \ S]是等效的,但会匹配一个新行。
The reason why removing table changes things is the IE's .innerHTML implementation doesn't rely on original markup received. Instead the markup is created dynamically by examining the DOM. When it sees a table element it places newlines in the output in different places to than when table is missing.
删除表更改内容的原因是IE的.innerHTML实现不依赖于收到的原始标记。而是通过检查DOM动态创建标记。当它看到一个表元素时,它会将输出中的换行符放在不同位置的输出中,而不是缺少表时。
#2
Seriously this is horrible. A solution based on getElementById / getElementsByTagName will be considerably more reliable and flexible.
说真的,这太可怕了。基于getElementById / getElementsByTagName的解决方案将更加可靠和灵活。
As for the actual problem it's probably because javascript multiline regex support is not x-browser safe and IE in particular has problems. Removing the table declaration will probably force IE to internally format the remaining markup to a single line (=success) where adding it back in will make IE add carriage returns etc (=fail).
至于实际问题,可能是因为javascript多行正则表达式支持不是x浏览器安全,IE尤其有问题。删除表声明可能会强制IE在内部将剩余标记格式化为单行(=成功),将其重新添加将使IE添加回车等(=失败)。
I know that you did say you know there are better ways, but you didn't explain why you'd persist with this. Relying on regex and further relying on IE plaintext interpretation of a DOM is going to get you into problems like this. Don't do it.
我知道你确实说你知道有更好的方法,但你没有解释为什么你坚持这个。依赖正则表达式并进一步依赖于IE的明文解释DOM会让你遇到这样的问题。不要这样做。
#3
The ending td tags have a character that needs to be escaped: the / slash. I don't know if that is why IE7 is tripping. Safari is okay as tested.
结尾的td标签有一个需要转义的字符:/ slash。我不知道这是不是IE7绊倒的原因。 Safari已经过测试了。
You might want to consider adding an id to the table. Then just iterate through the childNodes of the table only. You would go through a whole lot less HTML on a bigger page and probably conserve memory, too.
您可能需要考虑在表中添加id。然后只迭代表的childNodes。你可以在更大的页面上浏览更少的HTML,也可以节省内存。
#4
Try to build your regexp with new RegExp("", "gim")
. It's more portable.
尝试用新的RegExp(“”,“gim”)构建你的正则表达式。它更便携。