I wanted to parse following html tags contents retrieved through curl by regular expressions.
我想通过正则表达式解析通过curl检索的html标签内容。
<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>
so that output will be "IND - 203/9 (49.4 Ovs)"
.
所以输出将是“IND - 203/9(49.4 Ovs)”。
I have written following code but it is not working.please help.
我写了下面的代码,但它没有工作。请帮助。
$one="<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
$five="~(?<=<span class='ui-allscores'>)[.]*(?=</br></span>)~";
preg_match_all($five,$one,$ui);
print_r($ui);
3 个解决方案
#1
5
Try this one:
试试这个:
$string = "<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
Dynamic span tag:
动态范围标记:
preg_match('/<span[^>]*>(.*?)<\/span>/si', $string, $matches);
Specific span tag:
特定范围标记:
preg_match("/<span class='ui-allscores'>(.*?)<\/span>/si", $string, $matches);
// Output
array (size=2)
0 => string '<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>' (length=56)
1 => string 'IND - 203/9 (49.4 Ovs)' (length=22)
#2
1
If you simply want to remove the HTML tags, Use the php built-in function strip_tags to remove the html tags.
如果您只是想删除HTML标记,请使用php内置函数strip_tags删除html标记。
Another answer on removing html tags Strip all HTML tags, except allowed
关于删除html标签的另一个答案除了允许之外,删除所有HTML标签
#3
1
The problem of your regex is the [.]
part. This is matching only a literal .
, because the dot is written inside a character class. So just remove the square brackets.
正则表达式的问题是[。]部分。这只匹配文字。因为点是写在字符类中的。所以只需删除方括号。
$five="~(?<=<span class='ui-allscores'>).*(?=</br></span>)~";
The next problem then is the greediness of *
. You can change this matching behaviour by putting a ?
behind.
接下来的问题是*的贪婪。您可以通过放置?来更改此匹配行为背后。
$five="~(?<=<span class='ui-allscores'>).*?(?=</br></span>)~";
But the overall point is: You should most probably use a html parser for this job!
但总的问题是:你最有可能使用html解析器来完成这项工作!
See How do you parse and process HTML/XML in PHP?
#1
5
Try this one:
试试这个:
$string = "<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
Dynamic span tag:
动态范围标记:
preg_match('/<span[^>]*>(.*?)<\/span>/si', $string, $matches);
Specific span tag:
特定范围标记:
preg_match("/<span class='ui-allscores'>(.*?)<\/span>/si", $string, $matches);
// Output
array (size=2)
0 => string '<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>' (length=56)
1 => string 'IND - 203/9 (49.4 Ovs)' (length=22)
#2
1
If you simply want to remove the HTML tags, Use the php built-in function strip_tags to remove the html tags.
如果您只是想删除HTML标记,请使用php内置函数strip_tags删除html标记。
Another answer on removing html tags Strip all HTML tags, except allowed
关于删除html标签的另一个答案除了允许之外,删除所有HTML标签
#3
1
The problem of your regex is the [.]
part. This is matching only a literal .
, because the dot is written inside a character class. So just remove the square brackets.
正则表达式的问题是[。]部分。这只匹配文字。因为点是写在字符类中的。所以只需删除方括号。
$five="~(?<=<span class='ui-allscores'>).*(?=</br></span>)~";
The next problem then is the greediness of *
. You can change this matching behaviour by putting a ?
behind.
接下来的问题是*的贪婪。您可以通过放置?来更改此匹配行为背后。
$five="~(?<=<span class='ui-allscores'>).*?(?=</br></span>)~";
But the overall point is: You should most probably use a html parser for this job!
但总的问题是:你最有可能使用html解析器来完成这项工作!