I want to take a description of a RSS feed located in $the_content and cut it off after 2 full sentences (or 200 words and then the next full sentence) using preg_split.
我想对$ the_content中的RSS提要进行描述,然后使用preg_split在2个完整句子(或200个单词,然后是下一个完整句子)之后将其删除。
I tried a couple times, but I'm way off. I know what I want to do, but I can't seem to even start on something to make this work.
我试了好几次,但是我离开了。我知道我想做什么,但我似乎无法开始做某些工作。
Thanks!
1 个解决方案
#1
1
Proper splitting of HTML is very tricky, and not worth doing with regular expressions. If you want HTML, something like DOM text iterator will be useful.
正确拆分HTML非常棘手,不值得使用正则表达式。如果你想要HTML,像DOM文本迭代器这样的东西会很有用。
-
Convert description to text:
将描述转换为文本:
$text = html_entities_decode(strip_tags($html),ENT_QUOTES,'UTF-8');
-
This will take first 200 characters (200 words is a bit too much for a sentence, isn't it?) and then look for end of sentence:
这将需要前200个字符(200个字对于一个句子来说有点太多了,不是吗?)然后寻找句子的结尾:
$text = preg_replace('/^(.{200}.*?[.!?]).*$/','\1',$text);
You could change [.!?]
to something more sophisticated, e.g. require space after punctuation or require that there's no punctuation nearby:
您可以将[。!]更改为更复杂的内容,例如在标点符号后要求空格或要求附近没有标点符号:
(?<![^.!?]{5})[.!?](?=[^.!?]{5})
(?=…)
is positive assertion. (?<!…)
negative assertion that looks behind current position. {5}
means 5 times.
(?= ...)是肯定的断言。 (?
I haven't tested it :)
我没有测试过:)
#1
1
Proper splitting of HTML is very tricky, and not worth doing with regular expressions. If you want HTML, something like DOM text iterator will be useful.
正确拆分HTML非常棘手,不值得使用正则表达式。如果你想要HTML,像DOM文本迭代器这样的东西会很有用。
-
Convert description to text:
将描述转换为文本:
$text = html_entities_decode(strip_tags($html),ENT_QUOTES,'UTF-8');
-
This will take first 200 characters (200 words is a bit too much for a sentence, isn't it?) and then look for end of sentence:
这将需要前200个字符(200个字对于一个句子来说有点太多了,不是吗?)然后寻找句子的结尾:
$text = preg_replace('/^(.{200}.*?[.!?]).*$/','\1',$text);
You could change [.!?]
to something more sophisticated, e.g. require space after punctuation or require that there's no punctuation nearby:
您可以将[。!]更改为更复杂的内容,例如在标点符号后要求空格或要求附近没有标点符号:
(?<![^.!?]{5})[.!?](?=[^.!?]{5})
(?=…)
is positive assertion. (?<!…)
negative assertion that looks behind current position. {5}
means 5 times.
(?= ...)是肯定的断言。 (?
I haven't tested it :)
我没有测试过:)