Which results in:

结果:

Test <a href="http://www.live.com">Google!!</a>Test <a href="http://www.live.com">Google!!2</a>Test

I'm looking to end up with (the difference being in the first link):

我希望最后(区别在于第一个链接):

Test <a href="http://www.yahoo.com">Google!!</a>Test <a href="http://www.live.com">Google!!2</a>Test

The idea is to replace each URL within a link within a string with a unique other URL. It's for a newsletter system where I want to track what people have clicked on, so the URL will be a "fake" URL which they will be redirected to the real URL after the click is recorded.

这个想法是用一个独特的其他URL将一个链接内的每个URL替换为一个。它是一个通讯系统,我想追踪人们点击了什么,所以URL将是一个“假的”URL,他们将被重定向到真正的URL,在点击被记录之后。

4 个解决方案

#1


2  

The problem is that your first replace string is going to be matched by the second search pattern, effectively overwriting the first replace string with the second replace string.

问题是第一个替换字符串将被第二个搜索模式匹配,有效地用第二个替换字符串覆盖第一个替换字符串。

Unless you can somehow differentiate "modified" links from the original ones so that they won't get caught by the other expression (perhaps by adding an extra HTML property?), I don't think you can really solve this with a single preg_replace() call. One possible solution (aside from the differentiation in the regular expression) that comes to mind would be to use preg_match_all(), since it will give you an array of matches to work with. You could probably then encode the matched URLs with your tracking URL by iterating over the array and running a str_replace() on each matched URL.

除非您能够以某种方式将“修改过的”链接与原始链接区分开来,这样它们就不会被另一个表达式捕获(可能通过添加一个额外的HTML属性?),否则我认为您不可能通过一个preg_replace()调用来真正解决这个问题。一种可能的解决方案(除了正则表达式中的差异之外)是使用preg_match_all(),因为它将为您提供一个匹配数组。然后,您可以通过遍历数组并在每个匹配的URL上运行str_replace()来将匹配的URL与跟踪URL编码。

#2


1  

I'm not good with regexps, but if what you're doing is just replacing external URLs (i.e. not part of your site/application) with an internal URL that will track click-thrus and redirect the user, then it should be easy to construct a regexp that will match only external URLs.

我不擅长使用regexp,但是如果您所做的只是用一个内部URL替换外部URL(即不是站点/应用程序的一部分),该URL将跟踪单击-thrus并重定向用户,那么应该很容易构建一个只匹配外部URL的regexp。

So let's say your domain is foo.com, then you just need to create a regexp that will only match a hyperlink that doesn't contain a URL starting with http://foo.com. Now, as I said, I'm pretty bad with regexps, but here's my best stab at it:

假设你的域名是foo。com,你只需要创建一个regexp,它只匹配一个不包含URL的超链接,从http://foo.com开始。现在,正如我所说,我对regexp很不满意,但我最好的办法是:

$reg[0] = '`<a(\s[^>]*)href="(?!http://foo.com)([^"]*)"([^>]*)>`si';

Edit: If you want to track click-thrus to internal URLs as well, then just replace http://foo.com with the URL of your redirect/tracking page, e.g. http://foo.com/out.php.

编辑:如果你想跟踪点击链接到内部URL,那么只需用你的重定向/跟踪页面的URL替换http://foo.com,例如http://foo.com/out.php。

I'll walk through an example scenario just to show what I'm talking about. Let's say you have the below newsletter:

我将通过一个示例场景来展示我所谈论的内容。假设你有以下的通讯:

<h1>Newsletter Name</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis,
ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor
suscipit sapien, <a href="http://foo.com">eget auctor</a> ipsum ligula
non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus.
Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p>

For the purpose of this exercise, the search pattern will be:

就这项工作而言,搜寻模式如下:

// Only match links that don't begin with: http://foo.com/out.php
`<a(\s[^>]*)href="(?!http://foo.com/out\.php)([^"]*)"([^>]*)>`si

This regexp can be broken down into 3 parts:

此regexp可分为以下三部分:

  1. <a(\s[^>]*)href="
  2. <(\ s[^ >]*)href = "
  3. (?!http://foo.com/out\.php)([^"]*)
  4. (? ! http://foo.com/out \。php)([^]*)
  5. "([^>]*)>
  6. "([^ >]*)>

On the first pass of the search, the script will examine:

在搜索的第一步,脚本将检查:

<a href="http://bar.com">

This link satisfies all 3 components of the regexp, so the URL is stored in the database and is replaced with http://foo.com/out.php?id=1.

这个链接满足regexp的所有3个组件,因此URL存储在数据库中,并用http://foo.com/out.php?id=1替换。

On the second pass of the search, the script will examine:

在第二次搜索时,脚本将检查:

<a href="http://foo.com/out.php?id=1">

This link matches 1 and 3, but not 2. So the search will move on to the next link:

这个链接匹配1和3,但不是2。所以搜索将转到下一个链接:

<a href="http://foo.com">

This link satisfies all 3 components of the regexp, so it the URL is stored in the database and is replaced with http://foo.com/out.php?id=2.

这个链接满足regexp的所有3个组件,因此URL存储在数据库中,并用http://foo.com/out.php?id=2替换。

On the 3rd pass of the search, the script will examine the first 2 (already replaced) links, skip them, and then find a match with the last link in the newsletter.

在第三轮搜索中,脚本将检查前2个(已替换的)链接,跳过它们,然后找到与时事通讯中的最后一个链接匹配的链接。

#3


1  

I do not know, if I'd understood it right. But I'd written following snippet: The regex matches some hyperlinks. Then it loops thru the result and compares the text nodes against the hyperlink references. When a text node is found in a hyperlink reference, then it extends the matches by inserting a trackback sample link with a unique key.

我不知道,如果我理解正确的话。但是我写了以下代码片段:regex匹配一些超链接。然后循环遍历结果,并将文本节点与超链接引用进行比较。当在超链接引用中找到一个文本节点时,它通过插入一个带有唯一键的trackback样例链接来扩展匹配。

UPDATE The snippets finds all hyperlinks:

更新片段发现所有超链接:

  1. find links
  2. 找到链接
  3. build track back link
  4. 建立追溯链接
  5. find position of each found link (matches[3]) and set a template tag
  6. 找到每个找到的链接的位置(匹配[3])并设置一个模板标记
  7. replace templatetags by trackback links Each link position is unique.
  8. 通过trackback链接来替换templatetags,每个链接位置都是唯一的。

$string = '<h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> ';

$string = '

简讯名称

Lorem ipsum dolor sit amet,分别为圣体,adipiscing elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。

通讯名称

Lorem ipsum ipsum dolor sit amet, committed adipistforting elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。

通讯名称

Lorem ipsum ipsum dolor sit amet, committed adipistforting elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。< / p > ";

$regex = '<[^>]+>(.*)<\/[^>]+>';
preg_match_all("'<a\s+href=\"(.*)\"\s*>(.*)<\/[^>]+>'U",$string,$matches);


$uniqueURL = 'http://www.yourdomain.com/trackback.php?id=';

foreach($matches[2] as $k2 => $m2){
    foreach($matches[1] as $k1 => $m1){
        if(stristr($m1, $m2)){
                $uniq = $uniqueURL.md5($matches[0][$k2])."_".rand(1000,9999);
                $matches[3][$k1] = $uniq."&refLink=".$m1;
        }
    }
}


foreach($matches[3] as $key => $val) {

    $startAt = strpos($string, $matches[1][$key]);
    $endAt= $startAt + strlen($matches[1][$key]);

    $strBefore = substr($string,0, $startAt);
    $strAfter = substr($string,$endAt);

    $string = $strBefore . "@@@$key@@@" .$strAfter;

}
foreach($matches[3] as $key => $val) {
        $string = str_replace("@@@$key@@@",$matches[3][$key] ,$string);
}
print "<pre>";
echo $string;

#4


0  

Until PHP 5.3 where you can just create a function on the spot, you have to use either create_function (which I hate) or a helper class.

在PHP 5.3中,您只需在spot上创建一个函数,就必须使用create_function(我讨厌它)或助手类。

/**
 * For retrieving a new string from a list.
 */
class StringRotation {
    var $i = -1;
    var $strings = array();

    function addString($string) {
        $this->strings[] = $string;
    }

    /**
     * Use sprintf to produce result string
     * Rotates forward
     * @param array $params the string params to insert
     * @return string
     * @uses StringRotation::getNext()
     */
    function parseString($params) {
        $string = $this->getNext();
        array_unshift($params, $string);
        return call_user_func_array('sprintf', $params);
    }

    function getNext() {
        $this->i++;
        $t = count($this->strings);
        if ($this->i > $t) {
            $this->i = 0;
        }
        return $this->strings[$this->i];
    }

    function resetPointer() {
        $this->i = -1;
    }
}

$reg = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replaceLinks[0] = '<a%2$shref="http://www.yahoo.com"%4$s>';
$replaceLinks[1] = '<a%2$shref="http://www.live.com"%4$s>';

$string = 'Test <a href="http://www.google.com">Google!!</a>Test <a href="http://www.google.com">Google!!2</a>Test';

$linkReplace = new StringRotation();
foreach ($replaceLinks as $replaceLink) {
    $linkReplace->addString($replaceLink);
}

echo preg_replace_callback($reg, array($linkReplace, 'parseString'), $string);

#1


2  

The problem is that your first replace string is going to be matched by the second search pattern, effectively overwriting the first replace string with the second replace string.

问题是第一个替换字符串将被第二个搜索模式匹配,有效地用第二个替换字符串覆盖第一个替换字符串。

Unless you can somehow differentiate "modified" links from the original ones so that they won't get caught by the other expression (perhaps by adding an extra HTML property?), I don't think you can really solve this with a single preg_replace() call. One possible solution (aside from the differentiation in the regular expression) that comes to mind would be to use preg_match_all(), since it will give you an array of matches to work with. You could probably then encode the matched URLs with your tracking URL by iterating over the array and running a str_replace() on each matched URL.

除非您能够以某种方式将“修改过的”链接与原始链接区分开来,这样它们就不会被另一个表达式捕获(可能通过添加一个额外的HTML属性?),否则我认为您不可能通过一个preg_replace()调用来真正解决这个问题。一种可能的解决方案(除了正则表达式中的差异之外)是使用preg_match_all(),因为它将为您提供一个匹配数组。然后,您可以通过遍历数组并在每个匹配的URL上运行str_replace()来将匹配的URL与跟踪URL编码。

#2


1  

I'm not good with regexps, but if what you're doing is just replacing external URLs (i.e. not part of your site/application) with an internal URL that will track click-thrus and redirect the user, then it should be easy to construct a regexp that will match only external URLs.

我不擅长使用regexp,但是如果您所做的只是用一个内部URL替换外部URL(即不是站点/应用程序的一部分),该URL将跟踪单击-thrus并重定向用户,那么应该很容易构建一个只匹配外部URL的regexp。

So let's say your domain is foo.com, then you just need to create a regexp that will only match a hyperlink that doesn't contain a URL starting with http://foo.com. Now, as I said, I'm pretty bad with regexps, but here's my best stab at it:

假设你的域名是foo。com,你只需要创建一个regexp,它只匹配一个不包含URL的超链接,从http://foo.com开始。现在,正如我所说,我对regexp很不满意,但我最好的办法是:

$reg[0] = '`<a(\s[^>]*)href="(?!http://foo.com)([^"]*)"([^>]*)>`si';

Edit: If you want to track click-thrus to internal URLs as well, then just replace http://foo.com with the URL of your redirect/tracking page, e.g. http://foo.com/out.php.

编辑:如果你想跟踪点击链接到内部URL,那么只需用你的重定向/跟踪页面的URL替换http://foo.com,例如http://foo.com/out.php。

I'll walk through an example scenario just to show what I'm talking about. Let's say you have the below newsletter:

我将通过一个示例场景来展示我所谈论的内容。假设你有以下的通讯:

<h1>Newsletter Name</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis,
ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor
suscipit sapien, <a href="http://foo.com">eget auctor</a> ipsum ligula
non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus.
Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p>

For the purpose of this exercise, the search pattern will be:

就这项工作而言,搜寻模式如下:

// Only match links that don't begin with: http://foo.com/out.php
`<a(\s[^>]*)href="(?!http://foo.com/out\.php)([^"]*)"([^>]*)>`si

This regexp can be broken down into 3 parts:

此regexp可分为以下三部分:

  1. <a(\s[^>]*)href="
  2. <(\ s[^ >]*)href = "
  3. (?!http://foo.com/out\.php)([^"]*)
  4. (? ! http://foo.com/out \。php)([^]*)
  5. "([^>]*)>
  6. "([^ >]*)>

On the first pass of the search, the script will examine:

在搜索的第一步,脚本将检查:

<a href="http://bar.com">

This link satisfies all 3 components of the regexp, so the URL is stored in the database and is replaced with http://foo.com/out.php?id=1.

这个链接满足regexp的所有3个组件,因此URL存储在数据库中,并用http://foo.com/out.php?id=1替换。

On the second pass of the search, the script will examine:

在第二次搜索时,脚本将检查:

<a href="http://foo.com/out.php?id=1">

This link matches 1 and 3, but not 2. So the search will move on to the next link:

这个链接匹配1和3,但不是2。所以搜索将转到下一个链接:

<a href="http://foo.com">

This link satisfies all 3 components of the regexp, so it the URL is stored in the database and is replaced with http://foo.com/out.php?id=2.

这个链接满足regexp的所有3个组件,因此URL存储在数据库中,并用http://foo.com/out.php?id=2替换。

On the 3rd pass of the search, the script will examine the first 2 (already replaced) links, skip them, and then find a match with the last link in the newsletter.

在第三轮搜索中,脚本将检查前2个(已替换的)链接,跳过它们,然后找到与时事通讯中的最后一个链接匹配的链接。

#3


1  

I do not know, if I'd understood it right. But I'd written following snippet: The regex matches some hyperlinks. Then it loops thru the result and compares the text nodes against the hyperlink references. When a text node is found in a hyperlink reference, then it extends the matches by inserting a trackback sample link with a unique key.

我不知道,如果我理解正确的话。但是我写了以下代码片段:regex匹配一些超链接。然后循环遍历结果,并将文本节点与超链接引用进行比较。当在超链接引用中找到一个文本节点时,它通过插入一个带有唯一键的trackback样例链接来扩展匹配。

UPDATE The snippets finds all hyperlinks:

更新片段发现所有超链接:

  1. find links
  2. 找到链接
  3. build track back link
  4. 建立追溯链接
  5. find position of each found link (matches[3]) and set a template tag
  6. 找到每个找到的链接的位置(匹配[3])并设置一个模板标记
  7. replace templatetags by trackback links Each link position is unique.
  8. 通过trackback链接来替换templatetags,每个链接位置都是唯一的。

$string = '<h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> <h1>Newsletter Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis, ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor suscipit sapien, <a href="http://foo.com">bar.com</a> ipsum ligula non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus. Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p> ';

$string = '

简讯名称

Lorem ipsum dolor sit amet,分别为圣体,adipiscing elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。

通讯名称

Lorem ipsum ipsum dolor sit amet, committed adipistforting elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。

通讯名称

Lorem ipsum ipsum dolor sit amet, committed adipistforting elit。多内克·罗博蒂斯,ligula sed sollicitudin dignissim, lacus dolor suscipit sapien, bar.com ipsum ligula non tortor。Quisque sagittis sodales elit。毛里格言blandit湖。毛里consequat < a href = " http://last。调频" > laoreet湖< / >。< / p > ";

$regex = '<[^>]+>(.*)<\/[^>]+>';
preg_match_all("'<a\s+href=\"(.*)\"\s*>(.*)<\/[^>]+>'U",$string,$matches);


$uniqueURL = 'http://www.yourdomain.com/trackback.php?id=';

foreach($matches[2] as $k2 => $m2){
    foreach($matches[1] as $k1 => $m1){
        if(stristr($m1, $m2)){
                $uniq = $uniqueURL.md5($matches[0][$k2])."_".rand(1000,9999);
                $matches[3][$k1] = $uniq."&refLink=".$m1;
        }
    }
}


foreach($matches[3] as $key => $val) {

    $startAt = strpos($string, $matches[1][$key]);
    $endAt= $startAt + strlen($matches[1][$key]);

    $strBefore = substr($string,0, $startAt);
    $strAfter = substr($string,$endAt);

    $string = $strBefore . "@@@$key@@@" .$strAfter;

}
foreach($matches[3] as $key => $val) {
        $string = str_replace("@@@$key@@@",$matches[3][$key] ,$string);
}
print "<pre>";
echo $string;

#4


0  

Until PHP 5.3 where you can just create a function on the spot, you have to use either create_function (which I hate) or a helper class.

在PHP 5.3中,您只需在spot上创建一个函数,就必须使用create_function(我讨厌它)或助手类。

/**
 * For retrieving a new string from a list.
 */
class StringRotation {
    var $i = -1;
    var $strings = array();

    function addString($string) {
        $this->strings[] = $string;
    }

    /**
     * Use sprintf to produce result string
     * Rotates forward
     * @param array $params the string params to insert
     * @return string
     * @uses StringRotation::getNext()
     */
    function parseString($params) {
        $string = $this->getNext();
        array_unshift($params, $string);
        return call_user_func_array('sprintf', $params);
    }

    function getNext() {
        $this->i++;
        $t = count($this->strings);
        if ($this->i > $t) {
            $this->i = 0;
        }
        return $this->strings[$this->i];
    }

    function resetPointer() {
        $this->i = -1;
    }
}

$reg = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replaceLinks[0] = '<a%2$shref="http://www.yahoo.com"%4$s>';
$replaceLinks[1] = '<a%2$shref="http://www.live.com"%4$s>';

$string = 'Test <a href="http://www.google.com">Google!!</a>Test <a href="http://www.google.com">Google!!2</a>Test';

$linkReplace = new StringRotation();
foreach ($replaceLinks as $replaceLink) {
    $linkReplace->addString($replaceLink);
}

echo preg_replace_callback($reg, array($linkReplace, 'parseString'), $string);
标签:

相关文章