I need a regex that will give me the string inside an href tag and inside the quotes also.
我需要一个regex,它将在href标记内和引号内为我提供字符串。
For example i need to extract theurltoget.com in the following:
例如,我需要将theurlt.com提取如下:
<a href="theurltoget.com">URL</a>
Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.html
i only want http://www.mydomain.com/
另外,我只想要基本url部分。例如,从http://www.mydomain.com/page.html我只想要http://www.mydomain.com/
9 个解决方案
#1
15
Dont use regex for this. You can use xpath and built in php functions to get what you want:
不要为此使用regex。您可以使用xpath和内置的php函数来获得您想要的:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//@href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
#2
10
$html = '<a href="http://www.mydomain.com/page.html">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
#3
7
this expression will handle 3 options:
这个表达式将处理3个选项:
- no quotes
- 没有报价
- double quotes
- 双引号
- single quotes
- 单引号
'/href=["\']?([^"\'>]+)["\']?/'
' / href =[“\]?(\ ' >]+[^”)(“\”)? / '
#4
5
http://www.the-art-of-web.com/php/parse-links/
http://www.the-art-of-web.com/php/parse-links/
Let's start with the simplest case - a well formatted link with no extra attributes:
让我们从最简单的情况开始——一个格式良好、没有附加属性的链接:
/<a href=\"([^\"]*)\">(.*)<\/a>/iU
#5
4
Use the answer by @Alec if you're only looking for the base url part (the 2nd part of the question by @David)!
如果您只是在寻找基本url部分(问题的第二部分由@David提供),请使用@Alec的答案!
$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
This will give you:
这将给你:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html" class="myclass" rel="myrel
)
So you can use $href = $info["scheme"] . "://" . $info["host"]
Which gives you:
所以你可以使用$href = $info["scheme"]。”:/ /”。美元信息(“主机”)给你:
// http://www.mydomain.com
When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by @user2520237.
在查找href之间的整个url时,应该使用另一个regex,例如@user2520237提供的regex。
$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);
this will give you:
这将给你:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html
)
Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];
Which gives you:
现在可以使用$href = $info["scheme"]。”:/ /”。美元的信息(“主机”)。$ info(“路径”);它给你:
// http://www.mydomain.com/page.html
#6
3
For all href values replacement:
所有href值替换:
function replaceHref($html, $replaceStr)
{
$match = array();
$url = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);
if(count($match))
{
for($j=0; $j<count($match); $j++)
{
$html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
}
}
return $html;
}
$replaceStr = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);
echo $replaceHtml;
#7
1
This will handle the case where there are no quotes around the URL.
这将处理URL周围没有引号的情况。
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
但认真地说,不要用regex解析HTML。使用DOM或适当的解析库。
#8
0
/href="(https?://[^/]*)/
I think you should be able to handle the rest.
我认为你应该能应付其余的事。
#9
0
Because Positive and Negative Lookbehind are cool
因为积极和消极的外表是很酷的。
/(?<=href=\").+(?=\")/
It will match only what you want, without quotation marks
它将只匹配你想要的,没有引号
Array ( [0] => theurltoget.com )
数组([0]=> theurlt.com)
#1
15
Dont use regex for this. You can use xpath and built in php functions to get what you want:
不要为此使用regex。您可以使用xpath和内置的php函数来获得您想要的:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//@href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
#2
10
$html = '<a href="http://www.mydomain.com/page.html">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
#3
7
this expression will handle 3 options:
这个表达式将处理3个选项:
- no quotes
- 没有报价
- double quotes
- 双引号
- single quotes
- 单引号
'/href=["\']?([^"\'>]+)["\']?/'
' / href =[“\]?(\ ' >]+[^”)(“\”)? / '
#4
5
http://www.the-art-of-web.com/php/parse-links/
http://www.the-art-of-web.com/php/parse-links/
Let's start with the simplest case - a well formatted link with no extra attributes:
让我们从最简单的情况开始——一个格式良好、没有附加属性的链接:
/<a href=\"([^\"]*)\">(.*)<\/a>/iU
#5
4
Use the answer by @Alec if you're only looking for the base url part (the 2nd part of the question by @David)!
如果您只是在寻找基本url部分(问题的第二部分由@David提供),请使用@Alec的答案!
$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
This will give you:
这将给你:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html" class="myclass" rel="myrel
)
So you can use $href = $info["scheme"] . "://" . $info["host"]
Which gives you:
所以你可以使用$href = $info["scheme"]。”:/ /”。美元信息(“主机”)给你:
// http://www.mydomain.com
When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by @user2520237.
在查找href之间的整个url时,应该使用另一个regex,例如@user2520237提供的regex。
$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);
this will give you:
这将给你:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html
)
Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];
Which gives you:
现在可以使用$href = $info["scheme"]。”:/ /”。美元的信息(“主机”)。$ info(“路径”);它给你:
// http://www.mydomain.com/page.html
#6
3
For all href values replacement:
所有href值替换:
function replaceHref($html, $replaceStr)
{
$match = array();
$url = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);
if(count($match))
{
for($j=0; $j<count($match); $j++)
{
$html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
}
}
return $html;
}
$replaceStr = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);
echo $replaceHtml;
#7
1
This will handle the case where there are no quotes around the URL.
这将处理URL周围没有引号的情况。
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
但认真地说,不要用regex解析HTML。使用DOM或适当的解析库。
#8
0
/href="(https?://[^/]*)/
I think you should be able to handle the rest.
我认为你应该能应付其余的事。
#9
0
Because Positive and Negative Lookbehind are cool
因为积极和消极的外表是很酷的。
/(?<=href=\").+(?=\")/
It will match only what you want, without quotation marks
它将只匹配你想要的,没有引号
Array ( [0] => theurltoget.com )
数组([0]=> theurlt.com)