在句首和两个相邻的词之间不使用大写字母

时间:2022-05-05 21:37:42

I want to pull out capitalized words that don't start a sentence along with the previous and following word.

我想把大写的词和前面和后面的词放在一起。

I'm using:

我用的是:

(\w*)\b([A-Z][a-z]\w*)\b(\w*)

replace with:

替换为:

$1 -- $2 -- $3

Edit: It's only returning the $2. Will try suggestions.

编辑:它只返回2美元。会建议。

And regarding natural language? Don't care for this thing. I just want to see where capitals show up in a sentence so I can figure out if they're proper or not.

和关于自然语言吗?别管这事。我只是想知道一个句子中大写字母的位置,这样我就能知道它们是否正确。

2 个解决方案

#1


2  

How about this?

这个怎么样?

([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)

This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.

这并没有考虑到任何非字母。它还假设所有单词都由一个空格字符分隔。如果需要更复杂的支持,则需要对其进行修改。

#2


1  

Right now your regex fails because the \b can never match. It matches only between alphanumeric and non-alphanumeric characters; therefore it can never match between \w* and [A-Z] or another \w*.

现在您的regex失败了,因为\b永远不能匹配。它只匹配字母数字字符和非字母数字字符;因此,它永远不能在\w*和[A-Z]或另一个\w*之间匹配。

So, you need some other (=non-alphanumeric) characters between your words:

因此,你需要在你的单词之间添加一些其他(非字母数字)字符:

Try

试一试

(\w*)\W+([A-Z][a-z]\w*)\W+(\w*)

although (if your regex engine allows using Unicode properties), you might be happier with

虽然(如果您的regex引擎允许使用Unicode属性),但是您可能更喜欢使用它

(\w*)\W+(\p{Lu}\p{Ll}\w*)\W+(\w*)

As written, only capitalized words of length 2 or more are matched, i. e. "I" (as in "me") will not be matched by this. I suppose you inserted the [a-z] to avoid matches like "IBM"? Or what was your intention?

如所写,只有大写的长度为2或2以上的词才能匹配,即。“我”(和“我”一样)不会被这个匹配。我想你插入[a-z]以避免像“IBM”这样的匹配?或者你的意图是什么?

#1


2  

How about this?

这个怎么样?

([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)

This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.

这并没有考虑到任何非字母。它还假设所有单词都由一个空格字符分隔。如果需要更复杂的支持,则需要对其进行修改。

#2


1  

Right now your regex fails because the \b can never match. It matches only between alphanumeric and non-alphanumeric characters; therefore it can never match between \w* and [A-Z] or another \w*.

现在您的regex失败了,因为\b永远不能匹配。它只匹配字母数字字符和非字母数字字符;因此,它永远不能在\w*和[A-Z]或另一个\w*之间匹配。

So, you need some other (=non-alphanumeric) characters between your words:

因此,你需要在你的单词之间添加一些其他(非字母数字)字符:

Try

试一试

(\w*)\W+([A-Z][a-z]\w*)\W+(\w*)

although (if your regex engine allows using Unicode properties), you might be happier with

虽然(如果您的regex引擎允许使用Unicode属性),但是您可能更喜欢使用它

(\w*)\W+(\p{Lu}\p{Ll}\w*)\W+(\w*)

As written, only capitalized words of length 2 or more are matched, i. e. "I" (as in "me") will not be matched by this. I suppose you inserted the [a-z] to avoid matches like "IBM"? Or what was your intention?

如所写,只有大写的长度为2或2以上的词才能匹配,即。“我”(和“我”一样)不会被这个匹配。我想你插入[a-z]以避免像“IBM”这样的匹配?或者你的意图是什么?