多行正则表达式中是否存在捕获换行符号?

时间:2022-03-06 12:13:21

I've run into this problems several times before when trying to do some html scraping with php and the preg* functions.

在尝试使用php和preg *函数进行一些html抓取之前,我曾多次遇到过这个问题。

Most of the time I've to capture structures like that:

大部分时间我都要捕捉这样的结构:

<!-- comment -->
<tag1>lorem ipsum</tag>

<p>just more text with several html tags in it, sometimes CDATA encapsulated…</p>
<!-- /comment -->

In particular I want something like this:

特别是我想要这样的东西:

/<tag1>(.*?)<\/tag1>\n\n<p>(.*?)<\/p>/mi

but the \n\n doesn't look like it would work.

但\ n \ n看起来不会起作用。

Is there a general line-break switch?

是否有一般的换行开关?

3 个解决方案

#1


I think you could replace the \n\n with (\r?\n){2} this way you capture the CRLF pair instead of just the LF char.

我认为您可以用(\ r?\ n){2}替换\ n \ n,这样就可以捕获CRLF对,而不仅仅是LF char。

#2


Are you sure you want to parse HTML using regexps ? HTML isn't regular and there are too many corner cases.

您确定要使用regexps解析HTML吗? HTML不常规,并且有太多的极端情况。

I would investigate some form of HTML parser (perhaps this one ?), and then identify the pattern you're interested in via the returned HTML data structure.

我会调查某种形式的HTML解析器(也许是这个?),然后通过返回的HTML数据结构识别您感兴趣的模式。

#3


Or you could look at the Dom Extension to php. It has a function to load html from a string or a file. You can then use the php dom methods to traverse the dom and find the data you are interested in.

或者你可以看看php扩展到Dom。它具有从字符串或文件加载html的功能。然后,您可以使用php dom方法遍历dom并找到您感兴趣的数据。

#1


I think you could replace the \n\n with (\r?\n){2} this way you capture the CRLF pair instead of just the LF char.

我认为您可以用(\ r?\ n){2}替换\ n \ n,这样就可以捕获CRLF对,而不仅仅是LF char。

#2


Are you sure you want to parse HTML using regexps ? HTML isn't regular and there are too many corner cases.

您确定要使用regexps解析HTML吗? HTML不常规,并且有太多的极端情况。

I would investigate some form of HTML parser (perhaps this one ?), and then identify the pattern you're interested in via the returned HTML data structure.

我会调查某种形式的HTML解析器(也许是这个?),然后通过返回的HTML数据结构识别您感兴趣的模式。

#3


Or you could look at the Dom Extension to php. It has a function to load html from a string or a file. You can then use the php dom methods to traverse the dom and find the data you are interested in.

或者你可以看看php扩展到Dom。它具有从字符串或文件加载html的功能。然后,您可以使用php dom方法遍历dom并找到您感兴趣的数据。