从字符串中删除所有单个字符

时间:2023-01-19 20:22:10

I need a Regular Expression to remove ALL single characters from a string, not just single letters or numbers

我需要一个正则表达式来从字符串中删除所有单个字符,而不仅仅是单个字母或数字

The string is:

字符串:

"A Future Ft Casino Karate Chop ( Prod By Metro )"

“未来的英国《金融时报》赌场空手道印章(地铁推进)”

it should come out as:

结果应该是:

"Future Ft Casino Karate Chop Prod By Metro"

“未来的英国《金融时报》赌场空手道飞车”

The expression I am using at the moment (in PHP), correctly removes the single 'A' but leaves the single '(' and ')'

我现在正在使用的表达式(在PHP中),正确地删除了单个'A',但只保留了单个'('和')'

This is the code I am using:

这是我正在使用的代码:

$string = preg_replace('/\b\w\b\s?/', '', $string); 

2 个解决方案

#1


11  

Try this:

试试这个:

(^| ).( |$)

Breakdown:

分解:

   1.  (^| )  ->  Beginning of line or space  
   2.  .      ->  Any character  
   3.  ( |$)  ->  Space or End of line

Actual code:

实际的代码:

$string = preg_replace('/(^| ).( |$)/', '$1', $string); 

Note: I'm not familiar with the workings of PHP regex, so the code might need a slight tweak depending on how the actual regex needs declared.

注意:我不熟悉PHP regex的工作原理,因此根据实际的regex需要如何声明,代码可能需要稍微调整一下。

As m.buettner pointed out, there will be a trailing white space here with this code. A trim would be needed to clear it out.

m。buettner指出,这里会有一个拖尾的白色空格。需要修剪一下才能把它清理干净。

Edit: Arnis Juraga pointed out that this would not clear out multiple single characters a b c would filter out to b. If this is an issues use this regex:

编辑:Arnis Juraga指出,这将不会清除多个单个字符a b c会过滤掉b。如果这是一个问题使用这个regex:

(^| ).(( ).)*( |$)

The (( ).)* added to the middle will look for any space following by any character 0 or more times. The downside is this will end up with double spaces where a series of single characters were located.

添加到中间的().)*将查找任何字符后面的空格,次数为0或更多。缺点是,这将以双空格结尾,其中包含一系列单个字符。

Meaning this:

意思是这样的:

The a b c dog

Will become this:

将成为:

The  dog

After performing the replacement to get single individual characters, you would need to use the following regex to locate the double spaces, then replace with a single space

在执行替换以获取单个独立字符之后,您将需要使用以下regex来定位双空格,然后使用单个空格进行替换

( ){2}

#2


6  

A slightly more efficient version that does not require capturing would be using lookarounds. It's a bit less intuitive due to the multiple negative logic:

一个不需要捕获的稍微有效的版本是使用查找。由于多重否定逻辑,这有点不太直观:

$string = preg_replace('/(?<!\S).(?!\S)\s*/', '', $input);

This will remove any character that is neither preceded nor followed by a non-whitespace character (so only those that are between whitespace or at the string boundaries). It will also include all trailing whitespace in the match, so as to leave only the preceding whitespace if there is any. The caveat is, that just like Nick's answer the ) at the end of the string will leave a trailing whitespace (because it is in front of the character). This can easily be solved by trimming the string.

这将删除任何既不前面也不后面跟着非空格字符的字符(因此只删除那些位于空格之间或字符串边界处的字符)。它还将包含匹配中的所有尾随空格,以便在有空格时只保留前面的空格。需要注意的是,就像Nick的答案“The”一样,字符串末尾会留下一个尾空格(因为它在字符前面)。这很容易通过剪线来解决。

#1


11  

Try this:

试试这个:

(^| ).( |$)

Breakdown:

分解:

   1.  (^| )  ->  Beginning of line or space  
   2.  .      ->  Any character  
   3.  ( |$)  ->  Space or End of line

Actual code:

实际的代码:

$string = preg_replace('/(^| ).( |$)/', '$1', $string); 

Note: I'm not familiar with the workings of PHP regex, so the code might need a slight tweak depending on how the actual regex needs declared.

注意:我不熟悉PHP regex的工作原理,因此根据实际的regex需要如何声明,代码可能需要稍微调整一下。

As m.buettner pointed out, there will be a trailing white space here with this code. A trim would be needed to clear it out.

m。buettner指出,这里会有一个拖尾的白色空格。需要修剪一下才能把它清理干净。

Edit: Arnis Juraga pointed out that this would not clear out multiple single characters a b c would filter out to b. If this is an issues use this regex:

编辑:Arnis Juraga指出,这将不会清除多个单个字符a b c会过滤掉b。如果这是一个问题使用这个regex:

(^| ).(( ).)*( |$)

The (( ).)* added to the middle will look for any space following by any character 0 or more times. The downside is this will end up with double spaces where a series of single characters were located.

添加到中间的().)*将查找任何字符后面的空格,次数为0或更多。缺点是,这将以双空格结尾,其中包含一系列单个字符。

Meaning this:

意思是这样的:

The a b c dog

Will become this:

将成为:

The  dog

After performing the replacement to get single individual characters, you would need to use the following regex to locate the double spaces, then replace with a single space

在执行替换以获取单个独立字符之后,您将需要使用以下regex来定位双空格,然后使用单个空格进行替换

( ){2}

#2


6  

A slightly more efficient version that does not require capturing would be using lookarounds. It's a bit less intuitive due to the multiple negative logic:

一个不需要捕获的稍微有效的版本是使用查找。由于多重否定逻辑,这有点不太直观:

$string = preg_replace('/(?<!\S).(?!\S)\s*/', '', $input);

This will remove any character that is neither preceded nor followed by a non-whitespace character (so only those that are between whitespace or at the string boundaries). It will also include all trailing whitespace in the match, so as to leave only the preceding whitespace if there is any. The caveat is, that just like Nick's answer the ) at the end of the string will leave a trailing whitespace (because it is in front of the character). This can easily be solved by trimming the string.

这将删除任何既不前面也不后面跟着非空格字符的字符(因此只删除那些位于空格之间或字符串边界处的字符)。它还将包含匹配中的所有尾随空格,以便在有空格时只保留前面的空格。需要注意的是,就像Nick的答案“The”一样,字符串末尾会留下一个尾空格(因为它在字符前面)。这很容易通过剪线来解决。