据推测，有效的正则表达式不会返回PHP中的任何数据

I am using the following code:

我使用以下代码:

<?php
$stock = $_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];

$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);

$r_header = '/Prev. Week(.+?)Next Week/';
$r_date = '/\<b\>(.+?)\<\/b\>/';

preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);

echo $date[1];
?>

I've checked the regular expressions here and they appear to be valid. If I check just $url or $data they come out correctly and if I print $data and check the source the code that I'm looking for to use in the regex is in there. If you're interested in checking anything, an example of a proper URL would be http://biz.yahoo.com/research/earncal/g/goog.html

我在这里检查了正则表达式,它们似乎是有效的。如果我只检查$ url或$ data它们是正确的,如果我打印$ data并检查源代码,我正在寻找在正则表达式中使用的代码。如果您对检查任何内容感兴趣,请提供适当网址的示例http://biz.yahoo.com/research/earncal/g/goog.html

I've tried everything I could think of, including both var_dump($header) and var_dump($date), both of which return empty arrays.

我已经尝试了我能想到的一切,包括var_dump($ header)和var_dump($ date),两者都返回空数组。

I have been able to create other regular expressions that works. For instance, the following correctly returns "Earnings":

我已经能够创建其他有效的正则表达式。例如,以下内容正确返回“收入”:

$r_header = '/Company (.+?) Calendar/';
preg_match($r_header,$data,$header);
echo $header[1];

I am going nuts trying to figure out why this isn't working. Any help would be awesome. Thanks.

我很想弄清楚为什么这不起作用。任何帮助都是极好的。谢谢。

5 个解决方案

#1

Problem is that the HTML has newlines in it, which you need to incorporate with the s regex modifier, as below

问题是HTML中有换行符,您需要将其与s正则表达式修饰符合并,如下所示

<?php
$stock = "goog";//$_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];

$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);

$r_header = '/Prev. Week(.+?)Next Week/s';
$r_date = '/\<b\>(.+?)\<\/b\>/s';


preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);

var_dump($header);
?>

#2

Your regex doesn't allow for the line breaks in the HTML Try:

您的正则表达式不允许HTML尝试中的换行符:

$r_header = '/Prev\. Week((?s:.*))Next Week/';

The s tells it to match the newline characters in the . (match any).

s告诉它匹配中的换行符。 (匹配任何)。

#3

Dot does not match newlines by default. Use /your-regex/s

默认情况下,Dot与新行不匹配。使用/ your-regex / s

$r_header should probably be /Prev\. Week(.+?)Next Week/s

$ r_header应该是/ Prev \。周(。+?)下周/ s

FYI: You don't need to escape < and > in a regex.

仅供参考:你不需要在正则表达式中逃避 <和> 。

#4

You want to add the s (PCRE_DOTALL) modifier. By default . doesn't match newline, and I see the page has them between the two parts you look for.

您想要添加s(PCRE_DOTALL)修饰符。默认情况下。与换行符不匹配,我看到页面在你要查找的两个部分之间有它们。

Side note: although they don't hurt (except readability), you don't need a backslash before < and >.

旁注:尽管它们没有受到伤害(可读性除外),但在 <和> 之前不需要反斜杠。

#5

I think this is because you're applying the values to the regex as if it's plain text. However, it's HTML. For example, your regex should be modified to parse:

我认为这是因为您将值应用于正则表达式,就好像它是纯文本一样。但是,这是HTML。例如,您的正则表达式应该被修改为解析:

<a href="...">Prev. Week</a> ...

上一页周 ......

Not to parse regular plain text like: "Prev. Week ...."

不解析常规纯文本,如:“Prev.Week ....”

#1