正则表达式：如何查找字符串后跟非字母数字

I'm trying to use a regular expression (in php) to find a specific string which must be followed by a non-alpha numeric character (case insensitive).

我正在尝试使用正则表达式(在php中)来查找特定字符串,该字符串必须后跟非字母数字字符(不区分大小写)。

Example String:
Doggy is a lazy dog! Doggy. Dog and I.

Search String: Dog

Expected Result:
Doggy is a lazy <a href="">dog</a>! Doggy. <a href="">Dog</a> and I.

So it shouldn't match 'Doggy' because the Dog substring isn't followed by a non-alpha numeric character.

因此它不应该与'Doggy'匹配,因为Dog子字符串后面没有非字母数字字符。

I'm trying something along these lines, but it's not doing exactly what I want.

我正在沿着这些方向尝试一些东西,但它并没有完全符合我的要求。

preg_replace("/(dog)[^a-zA-Z0-9\s\p]/i/", "", $str);

2 个解决方案

#1

It sounds to me like what you're actually trying to do here is perform an exact word match. Not necessarily "a string followed by a non-alphanumeric".

听起来像你在这里尝试做的是执行一个完全匹配的单词。不一定是“字符串后跟非字母数字”。

You can achieve this with the \b "word boundary" regex anchor:

您可以使用\ b“单词边界”正则表达式锚来实现此目的:

$search = "dog"
preg_replace("/\b".$search."\b/i", "", $str);

#2

Your regex is almost spot on, but there are a few errors:

你的正则表达式几乎是现货,但有一些错误:

I assume you want to match Dog with a space after it, if so, remove the \s

我假设您想要将Dog与其后的空格匹配,如果是这样,请删除\ s

\p isn't a valid regex character.

\ p不是有效的正则表达式字符。

You shouldn't have an extra slash after the \i. \i\ -> \i

你不应该在\ i之后有一个额外的斜杠。 \ i \ - > \ i

The way your regex currently is, it'll remove the non-alphanumeric character, you can remedy this by surrounding it in a capture group.

你的正则表达式当前的方式,它将删除非字母数字字符,你可以通过在捕获组中包围它来解决这个问题。

You also have no code to add the anchor tags (<a href=""></a>).

您也没有添加锚标记的代码( )。

So, I've gone and compiled all these into the statement below:

所以,我已经将所有这些编译成以下声明:

preg_replace("/(dog)([^a-zA-Z0-9])/i", '<a href="">$1</a>$2', $str);

This returns:

Doggy is a lazy <a href="">dog</a>! Doggy. <a href="">Dog</a> and I.

#1