$source="<p><b>Lal, Vaninm</b></p>
<p><b>Vice President &</b></p>
<p><b>General Manager</b></p>
<p>Company 1 Inc.</p>
<p>PO Box 123456</p>
<p>salt Lake1, 00111-3333</p>
<p>111-111-111 / F: 111-111-111</p>
<p>info1@site1.com</p>
<p><b>Andrus, Reed </b></p>
<p><b>Manager</b></p>
<p>Company 2 Inc.</p>
<p>Monada, Suite 222</p>
<p>J , Lousiana 2222</p>
<p>222-222-222 / F: 222-222-222</p>
<p>info2@site2.com</p>
<p><b>Sharma, John L.</b></p>
<p><b>Senior Property Manager</b></p>
<p>Company 3 Ltd.</p>
<p>PO Box 3333</p>
<p>Grand Cinema, Layman Islands</p>
<p>FGB 333</p>
<p>333-333-333</p>
<p>info3@site3.com</p>
<p><b>Lucky, Philip S</b></p>
<p>Life Member</p>
<p>Company 4 Inc.</p>
<p>Battelsville, Oklahoma 74000</p>
<p>444-444-444</p>
<p><b>Berry, Richard B, RPA, CPM</b></p>";
$records = preg_split ("@\<p\>\<b\>(.*?)(\<p\>(.*)\</p\>\<p\>\<b\>)@s", $source);
var_dump($records);
The array must contain four records. The data contained inside tags are meaningless. I am new to regular expression. I tried as above. Please suggest regular expressions for this. Thanks in advance.
该数组必须包含四个记录。标签内包含的数据毫无意义。我是正则表达的新手。我试过上面的事情。请为此建议正则表达式。提前致谢。
I think <p><b> ....<p>...</p><p><b>
identifies a record. But I cant make the required expression.
我认为
....
...
标识一条记录。但是我无法做出必要的表达。
1 个解决方案
#1
0
With all the disclaimers about parsing html with regex, the following regex will correctly split your input.
关于使用正则表达式解析html的所有免责声明,以下正则表达式将正确地分割您的输入。
Version 1: file with only newlines (unix, osx)
版本1:仅包含换行符的文件(unix,osx)
(?=(?<=^|((?<!</b>)</p>\n))<p><b>)
Version 2: file with carriage returns and newlines (windows)
版本2:包含回车符和换行符的文件(窗口)
(?=(?<=^|((?<!</b>)</p>\r\n))<p><b>)
Therefore, if you were using the first, you could write:
因此,如果你使用第一个,你可以写:
$records = preg_split('~(?=(?<=^|((?<!</b>)</p>\n))<p><b>)~', $str);
Note that there are actually five records because of the last line:
请注意,由于最后一行,实际上有五条记录:
<p><b>Berry, Richard B, RPA, CPM</b></p>";
How does it work?
它是如何工作的?
With lookahead and lookbehind. This is a "zero-width" match that just looks for a certain position.
具有前瞻和外观。这是一个“零宽度”匹配,只是寻找某个位置。
- The
(?=
lookahead asserts that the current position that is followed by<p><b>
... - as long as
<p><b>
is preceded by (lookbehind(?<=
) the beginning of the string^
or</p>\n
that is not preceded by</b>
(negative lookbehind(?<!</b>)
)
(?= lookahead断言当前位置后跟
...
只要
前面有(lookbehind(?<=)字符串的开头^或 \ n,前面没有 (负面的后观(? ))
Enjoy!
#1
0
With all the disclaimers about parsing html with regex, the following regex will correctly split your input.
关于使用正则表达式解析html的所有免责声明,以下正则表达式将正确地分割您的输入。
Version 1: file with only newlines (unix, osx)
版本1:仅包含换行符的文件(unix,osx)
(?=(?<=^|((?<!</b>)</p>\n))<p><b>)
Version 2: file with carriage returns and newlines (windows)
版本2:包含回车符和换行符的文件(窗口)
(?=(?<=^|((?<!</b>)</p>\r\n))<p><b>)
Therefore, if you were using the first, you could write:
因此,如果你使用第一个,你可以写:
$records = preg_split('~(?=(?<=^|((?<!</b>)</p>\n))<p><b>)~', $str);
Note that there are actually five records because of the last line:
请注意,由于最后一行,实际上有五条记录:
<p><b>Berry, Richard B, RPA, CPM</b></p>";
How does it work?
它是如何工作的?
With lookahead and lookbehind. This is a "zero-width" match that just looks for a certain position.
具有前瞻和外观。这是一个“零宽度”匹配,只是寻找某个位置。
- The
(?=
lookahead asserts that the current position that is followed by<p><b>
... - as long as
<p><b>
is preceded by (lookbehind(?<=
) the beginning of the string^
or</p>\n
that is not preceded by</b>
(negative lookbehind(?<!</b>)
)
(?= lookahead断言当前位置后跟
...
只要
前面有(lookbehind(?<=)字符串的开头^或 \ n,前面没有 (负面的后观(? ))
Enjoy!