I need to convert
我需要转换
$text = 'We had <i>fun</i>. Look at <a href="http://example.com">this photo</a> of Joe';
[Edit] There could be multiple links in the text.
[编辑]文本中可能有多个链接。
to
至
$text = 'We had fun. Look at this photo (http://example.com) of Joe';
All HTML tags are to be removed and the href value from <a>
tags needs to be added like above.
将删除所有HTML标记,并且需要像上面一样添加来自标记的href值。
What would be an efficient way to solve this with regex? Any code snippet would be great.
用正则表达式解决这个问题的有效方法是什么?任何代码片段都会很棒。
5 个解决方案
#1
5
First do a preg_replace to keep the link. You could use:
首先做一个preg_replace来保持链接。你可以使用:
preg_replace('<a href="(.*?)">(.*?)</a>', '$\2 ($\1)', $str);
Then use strip_tags
which will finish off the rest of the tags.
然后使用strip_tags来完成其余的标签。
#2
1
try an xml parser to replace any tag with it's inner html and the a tags with its href attribute.
尝试使用xml解析器将任何标记替换为内部html和带有href属性的a标记。
http://www.php.net/manual/en/book.domxml.php
http://www.php.net/manual/en/book.domxml.php
#3
1
The DOM solution:
DOM解决方案:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[@href]') as $node) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());
and the same without XPath:
并且没有XPath:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
if($node->hasAttribute('href')) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
}
echo strip_tags($dom->saveHTML());
All it does is load any HTML into a DomDocument instance. In the first case it uses an XPath expression, which is kinda like SQL for XML, and gets all links with an href attribute. It then creates a text node element from the innerHTML and the href attribute and replaces the link. The second version just uses the DOM API and no Xpath.
它只是将任何HTML加载到DomDocument实例中。在第一种情况下,它使用XPath表达式,有点像SQL for XML,并获得具有href属性的所有链接。然后,它从innerHTML和href属性创建一个文本节点元素,并替换该链接。第二个版本只使用DOM API而没有Xpath。
Yes, it's a few lines more than Regex but this is clean and easy to understand and it won't give you any headaches when you need to add additional logic.
是的,它比Regex多了几行,但这很干净且易于理解,当你需要添加额外的逻辑时它不会给你带来任何麻烦。
#4
0
I've done things like this using variations of substring and replace.
I'd probably use regex today
but you wanted an alternative so:
我使用子串和替换的变体做了这样的事情。我今天可能会使用正则表达式,但你想要一个替代方案:
For the <i>
tags, I'd do something like:
对于标签,我会做类似的事情:
$text = replace($text, "<i>", "");
$text = replace($text, "</i>", "");
(My php is really rusty, so replace
may not be the right function name -- but the idea is what I'm sharing.)
(我的php真的很生疏,所以替换可能不是正确的功能名称 - 但这个想法就是我所分享的。)
The <a>
tag is a bit more tricky. But, it can be done. You need to find the point that <a
starts and that the >
ends with. Then you extract the entire length and replace the closing </a>
标签有点棘手。但这是可以完成的。您需要找到以...结尾的点。然后,您提取整个长度并替换结束
That might go something like:
这可能是这样的:
$start = strrpos( $text, "<a" );
$end = strrpos( $text, "</a>", $start );
$text = substr( $text, $start, $end );
$text = replace($text, "</a>", "");
(I don't know if this will work, again the idea is what I want to communicate. I hope the code fragments help but they probably don't work "out of the box". There are also a lot of possible bugs in the code snippets depending on your exact implementation and environment)
(我不知道这是否会起作用,这个想法也是我想要传达的。我希望代码片段有所帮助,但它们可能无法“开箱即用”。还有很多可能的错误代码片段取决于您的确切实施和环境)
Reference:
参考:
- strrpos - http://www.php.net/manual/en/function.strrpos.php
- strrpos - http://www.php.net/manual/en/function.strrpos.php
- replace - http://www.php.net/manual/en/function.str-replace.php
- 替换 - http://www.php.net/manual/en/function.str-replace.php
- substr - http://php.net/manual/en/function.substr.php
- substr - http://php.net/manual/en/function.substr.php
#5
0
It's also very easy to do with a parser:
使用解析器也很容易:
# available from http://simplehtmldom.sourceforge.net
include('simple_html_dom.php');
# parse and echo
$html = str_get_html('We had <i>fun</i>. Look at <a href="http://example.com">this photo</a> of Joe');
$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";
echo strip_tags($html);
And that produces the code you want in your test case.
这会在您的测试用例中生成您想要的代码。
#1
5
First do a preg_replace to keep the link. You could use:
首先做一个preg_replace来保持链接。你可以使用:
preg_replace('<a href="(.*?)">(.*?)</a>', '$\2 ($\1)', $str);
Then use strip_tags
which will finish off the rest of the tags.
然后使用strip_tags来完成其余的标签。
#2
1
try an xml parser to replace any tag with it's inner html and the a tags with its href attribute.
尝试使用xml解析器将任何标记替换为内部html和带有href属性的a标记。
http://www.php.net/manual/en/book.domxml.php
http://www.php.net/manual/en/book.domxml.php
#3
1
The DOM solution:
DOM解决方案:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[@href]') as $node) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());
and the same without XPath:
并且没有XPath:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
if($node->hasAttribute('href')) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
}
echo strip_tags($dom->saveHTML());
All it does is load any HTML into a DomDocument instance. In the first case it uses an XPath expression, which is kinda like SQL for XML, and gets all links with an href attribute. It then creates a text node element from the innerHTML and the href attribute and replaces the link. The second version just uses the DOM API and no Xpath.
它只是将任何HTML加载到DomDocument实例中。在第一种情况下,它使用XPath表达式,有点像SQL for XML,并获得具有href属性的所有链接。然后,它从innerHTML和href属性创建一个文本节点元素,并替换该链接。第二个版本只使用DOM API而没有Xpath。
Yes, it's a few lines more than Regex but this is clean and easy to understand and it won't give you any headaches when you need to add additional logic.
是的,它比Regex多了几行,但这很干净且易于理解,当你需要添加额外的逻辑时它不会给你带来任何麻烦。
#4
0
I've done things like this using variations of substring and replace.
I'd probably use regex today
but you wanted an alternative so:
我使用子串和替换的变体做了这样的事情。我今天可能会使用正则表达式,但你想要一个替代方案:
For the <i>
tags, I'd do something like:
对于标签,我会做类似的事情:
$text = replace($text, "<i>", "");
$text = replace($text, "</i>", "");
(My php is really rusty, so replace
may not be the right function name -- but the idea is what I'm sharing.)
(我的php真的很生疏,所以替换可能不是正确的功能名称 - 但这个想法就是我所分享的。)
The <a>
tag is a bit more tricky. But, it can be done. You need to find the point that <a
starts and that the >
ends with. Then you extract the entire length and replace the closing </a>
标签有点棘手。但这是可以完成的。您需要找到以...结尾的点。然后,您提取整个长度并替换结束
That might go something like:
这可能是这样的:
$start = strrpos( $text, "<a" );
$end = strrpos( $text, "</a>", $start );
$text = substr( $text, $start, $end );
$text = replace($text, "</a>", "");
(I don't know if this will work, again the idea is what I want to communicate. I hope the code fragments help but they probably don't work "out of the box". There are also a lot of possible bugs in the code snippets depending on your exact implementation and environment)
(我不知道这是否会起作用,这个想法也是我想要传达的。我希望代码片段有所帮助,但它们可能无法“开箱即用”。还有很多可能的错误代码片段取决于您的确切实施和环境)
Reference:
参考:
- strrpos - http://www.php.net/manual/en/function.strrpos.php
- strrpos - http://www.php.net/manual/en/function.strrpos.php
- replace - http://www.php.net/manual/en/function.str-replace.php
- 替换 - http://www.php.net/manual/en/function.str-replace.php
- substr - http://php.net/manual/en/function.substr.php
- substr - http://php.net/manual/en/function.substr.php
#5
0
It's also very easy to do with a parser:
使用解析器也很容易:
# available from http://simplehtmldom.sourceforge.net
include('simple_html_dom.php');
# parse and echo
$html = str_get_html('We had <i>fun</i>. Look at <a href="http://example.com">this photo</a> of Joe');
$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";
echo strip_tags($html);
And that produces the code you want in your test case.
这会在您的测试用例中生成您想要的代码。