I am using the following code:
我使用以下代码:
<?php
$stock = $_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];
$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);
$r_header = '/Prev. Week(.+?)Next Week/';
$r_date = '/\<b\>(.+?)\<\/b\>/';
preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);
echo $date[1];
?>
I've checked the regular expressions here and they appear to be valid. If I check just $url or $data they come out correctly and if I print $data and check the source the code that I'm looking for to use in the regex is in there. If you're interested in checking anything, an example of a proper URL would be http://biz.yahoo.com/research/earncal/g/goog.html
我在这里检查了正则表达式,它们似乎是有效的。如果我只检查$ url或$ data它们是正确的,如果我打印$ data并检查源代码,我正在寻找在正则表达式中使用的代码。如果您对检查任何内容感兴趣,请提供适当网址的示例http://biz.yahoo.com/research/earncal/g/goog.html
I've tried everything I could think of, including both var_dump($header) and var_dump($date), both of which return empty arrays.
我已经尝试了我能想到的一切,包括var_dump($ header)和var_dump($ date),两者都返回空数组。
I have been able to create other regular expressions that works. For instance, the following correctly returns "Earnings":
我已经能够创建其他有效的正则表达式。例如,以下内容正确返回“收入”:
$r_header = '/Company (.+?) Calendar/';
preg_match($r_header,$data,$header);
echo $header[1];
I am going nuts trying to figure out why this isn't working. Any help would be awesome. Thanks.
我很想弄清楚为什么这不起作用。任何帮助都是极好的。谢谢。
5 个解决方案
#1
3
Problem is that the HTML has newlines in it, which you need to incorporate with the s regex modifier, as below
问题是HTML中有换行符,您需要将其与s正则表达式修饰符合并,如下所示
<?php
$stock = "goog";//$_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];
$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);
$r_header = '/Prev. Week(.+?)Next Week/s';
$r_date = '/\<b\>(.+?)\<\/b\>/s';
preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);
var_dump($header);
?>
#2
4
Your regex doesn't allow for the line breaks in the HTML Try:
您的正则表达式不允许HTML尝试中的换行符:
$r_header = '/Prev\. Week((?s:.*))Next Week/';
The s
tells it to match the newline characters in the .
(match any).
s告诉它匹配中的换行符。 (匹配任何)。
#3
2
- Dot does not match newlines by default. Use
/your-regex/s
-
$r_header
should probably be/Prev\. Week(.+?)Next Week/s
- FYI: You don't need to escape
<
and>
in a regex.
默认情况下,Dot与新行不匹配。使用/ your-regex / s
$ r_header应该是/ Prev \。周(。+?)下周/ s
仅供参考:你不需要在正则表达式中逃避 <和> 。
#4
2
You want to add the s (PCRE_DOTALL)
modifier. By default .
doesn't match newline, and I see the page has them between the two parts you look for.
您想要添加s(PCRE_DOTALL)修饰符。默认情况下 。与换行符不匹配,我看到页面在你要查找的两个部分之间有它们。
Side note: although they don't hurt (except readability), you don't need a backslash before <
and >
.
旁注:尽管它们没有受到伤害(可读性除外),但在 <和> 之前不需要反斜杠。
#5
0
I think this is because you're applying the values to the regex as if it's plain text. However, it's HTML. For example, your regex should be modified to parse:
我认为这是因为您将值应用于正则表达式,就好像它是纯文本一样。但是,这是HTML。例如,您的正则表达式应该被修改为解析:
<a href="...">Prev. Week</a> ...
上一页周 ......
Not to parse regular plain text like: "Prev. Week ...."
不解析常规纯文本,如:“Prev.Week ....”
#1
3
Problem is that the HTML has newlines in it, which you need to incorporate with the s regex modifier, as below
问题是HTML中有换行符,您需要将其与s正则表达式修饰符合并,如下所示
<?php
$stock = "goog";//$_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];
$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);
$r_header = '/Prev. Week(.+?)Next Week/s';
$r_date = '/\<b\>(.+?)\<\/b\>/s';
preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);
var_dump($header);
?>
#2
4
Your regex doesn't allow for the line breaks in the HTML Try:
您的正则表达式不允许HTML尝试中的换行符:
$r_header = '/Prev\. Week((?s:.*))Next Week/';
The s
tells it to match the newline characters in the .
(match any).
s告诉它匹配中的换行符。 (匹配任何)。
#3
2
- Dot does not match newlines by default. Use
/your-regex/s
-
$r_header
should probably be/Prev\. Week(.+?)Next Week/s
- FYI: You don't need to escape
<
and>
in a regex.
默认情况下,Dot与新行不匹配。使用/ your-regex / s
$ r_header应该是/ Prev \。周(。+?)下周/ s
仅供参考:你不需要在正则表达式中逃避 <和> 。
#4
2
You want to add the s (PCRE_DOTALL)
modifier. By default .
doesn't match newline, and I see the page has them between the two parts you look for.
您想要添加s(PCRE_DOTALL)修饰符。默认情况下 。与换行符不匹配,我看到页面在你要查找的两个部分之间有它们。
Side note: although they don't hurt (except readability), you don't need a backslash before <
and >
.
旁注:尽管它们没有受到伤害(可读性除外),但在 <和> 之前不需要反斜杠。
#5
0
I think this is because you're applying the values to the regex as if it's plain text. However, it's HTML. For example, your regex should be modified to parse:
我认为这是因为您将值应用于正则表达式,就好像它是纯文本一样。但是,这是HTML。例如,您的正则表达式应该被修改为解析:
<a href="...">Prev. Week</a> ...
上一页周 ......
Not to parse regular plain text like: "Prev. Week ...."
不解析常规纯文本,如:“Prev.Week ....”