需要一个正则表达式的字符串来preg_split记录到数组中

$source="<p><b>Lal, Vaninm</b></p>
<p><b>Vice President &amp;</b></p>
<p><b>General Manager</b></p>
<p>Company 1 Inc.</p>
<p>PO Box 123456</p>
<p>salt Lake1, 00111-3333</p>
<p>111-111-111 / F: 111-111-111</p>
<p>info1@site1.com</p>
<p><b>Andrus, Reed </b></p>
<p><b>Manager</b></p>
<p>Company 2 Inc.</p>
<p>Monada, Suite 222</p>
<p>J , Lousiana 2222</p>
<p>222-222-222 / F: 222-222-222</p>
<p>info2@site2.com</p>
<p><b>Sharma, John L.</b></p>
<p><b>Senior Property Manager</b></p>
<p>Company 3  Ltd.</p>
<p>PO Box 3333</p>
<p>Grand Cinema, Layman Islands</p>
<p>FGB 333</p>
<p>333-333-333</p>
<p>info3@site3.com</p>
<p><b>Lucky, Philip S</b></p>
<p>Life Member</p>
<p>Company 4 Inc.</p>
<p>Battelsville, Oklahoma 74000</p>
<p>444-444-444</p>
<p><b>Berry, Richard B, RPA, CPM</b></p>";
$records = preg_split ("@\<p\>\<b\>(.*?)(\<p\>(.*)\</p\>\<p\>\<b\>)@s", $source); 
var_dump($records);

The array must contain four records. The data contained inside tags are meaningless. I am new to regular expression. I tried as above. Please suggest regular expressions for this. Thanks in advance.

该数组必须包含四个记录。标签内包含的数据毫无意义。我是正则表达的新手。我试过上面的事情。请为此建议正则表达式。提前致谢。

I think  ....... identifies a record. But I cant make the required expression.

我认为

....

...

标识一条记录。但是我无法做出必要的表达。

1 个解决方案

#1

With all the disclaimers about parsing html with regex, the following regex will correctly split your input.

关于使用正则表达式解析html的所有免责声明,以下正则表达式将正确地分割您的输入。

Version 1: file with only newlines (unix, osx)

版本1:仅包含换行符的文件(unix,osx)

(?=(?<=^|((?<!</b>)</p>\n))<p><b>)

Version 2: file with carriage returns and newlines (windows)

版本2:包含回车符和换行符的文件(窗口)

(?=(?<=^|((?<!</b>)</p>\r\n))<p><b>)

Therefore, if you were using the first, you could write:

因此,如果你使用第一个,你可以写:

$records = preg_split('~(?=(?<=^|((?<!</b>)</p>\n))<p><b>)~', $str);

Note that there are actually five records because of the last line:

请注意,由于最后一行,实际上有五条记录:

<p><b>Berry, Richard B, RPA, CPM</b></p>";

How does it work?

它是如何工作的?

With lookahead and lookbehind. This is a "zero-width" match that just looks for a certain position.

具有前瞻和外观。这是一个“零宽度”匹配,只是寻找某个位置。

The (?= lookahead asserts that the current position that is followed by ...

(?= lookahead断言当前位置后跟

...

as long as  is preceded by (lookbehind (?<= ) the beginning of the string ^ or \n that is not preceded by  (negative lookbehind (?<!))

只要

前面有(lookbehind(?<=)字符串的开头^或 \ n,前面没有 (负面的后观(? ))

Enjoy!

#1

With all the disclaimers about parsing html with regex, the following regex will correctly split your input.

关于使用正则表达式解析html的所有免责声明,以下正则表达式将正确地分割您的输入。

Version 1: file with only newlines (unix, osx)

版本1:仅包含换行符的文件(unix,osx)

(?=(?<=^|((?<!</b>)</p>\n))<p><b>)

Version 2: file with carriage returns and newlines (windows)

版本2:包含回车符和换行符的文件(窗口)

(?=(?<=^|((?<!</b>)</p>\r\n))<p><b>)

Therefore, if you were using the first, you could write:

因此,如果你使用第一个,你可以写:

$records = preg_split('~(?=(?<=^|((?<!</b>)</p>\n))<p><b>)~', $str);

Note that there are actually five records because of the last line:

请注意,由于最后一行,实际上有五条记录:

<p><b>Berry, Richard B, RPA, CPM</b></p>";

How does it work?

它是如何工作的?

With lookahead and lookbehind. This is a "zero-width" match that just looks for a certain position.

具有前瞻和外观。这是一个“零宽度”匹配,只是寻找某个位置。

The (?= lookahead asserts that the current position that is followed by ...

(?= lookahead断言当前位置后跟

...

as long as  is preceded by (lookbehind (?<= ) the beginning of the string ^ or \n that is not preceded by  (negative lookbehind (?<!))

只要

前面有(lookbehind(?<=)字符串的开头^或 \ n,前面没有 (负面的后观(? ))

Enjoy!

秒客网

需要一个正则表达式的字符串来preg_split记录到数组中

1 个解决方案

#1

#1

相关文章