I want to pull out capitalized words that don't start a sentence along with the previous and following word.
我想把大写的词和前面和后面的词放在一起。
I'm using:
我用的是:
(\w*)\b([A-Z][a-z]\w*)\b(\w*)
replace with:
替换为:
$1 -- $2 -- $3
Edit: It's only returning the $2. Will try suggestions.
编辑:它只返回2美元。会建议。
And regarding natural language? Don't care for this thing. I just want to see where capitals show up in a sentence so I can figure out if they're proper or not.
和关于自然语言吗?别管这事。我只是想知道一个句子中大写字母的位置,这样我就能知道它们是否正确。
2 个解决方案
#1
2
How about this?
这个怎么样?
([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)
This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.
这并没有考虑到任何非字母。它还假设所有单词都由一个空格字符分隔。如果需要更复杂的支持,则需要对其进行修改。
#2
1
Right now your regex fails because the \b
can never match. It matches only between alphanumeric and non-alphanumeric characters; therefore it can never match between \w*
and [A-Z]
or another \w*
.
现在您的regex失败了,因为\b永远不能匹配。它只匹配字母数字字符和非字母数字字符;因此,它永远不能在\w*和[A-Z]或另一个\w*之间匹配。
So, you need some other (=non-alphanumeric) characters between your words:
因此,你需要在你的单词之间添加一些其他(非字母数字)字符:
Try
试一试
(\w*)\W+([A-Z][a-z]\w*)\W+(\w*)
although (if your regex engine allows using Unicode properties), you might be happier with
虽然(如果您的regex引擎允许使用Unicode属性),但是您可能更喜欢使用它
(\w*)\W+(\p{Lu}\p{Ll}\w*)\W+(\w*)
As written, only capitalized words of length 2 or more are matched, i. e. "I" (as in "me") will not be matched by this. I suppose you inserted the [a-z]
to avoid matches like "IBM"? Or what was your intention?
如所写,只有大写的长度为2或2以上的词才能匹配,即。“我”(和“我”一样)不会被这个匹配。我想你插入[a-z]以避免像“IBM”这样的匹配?或者你的意图是什么?
#1
2
How about this?
这个怎么样?
([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)
This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.
这并没有考虑到任何非字母。它还假设所有单词都由一个空格字符分隔。如果需要更复杂的支持,则需要对其进行修改。
#2
1
Right now your regex fails because the \b
can never match. It matches only between alphanumeric and non-alphanumeric characters; therefore it can never match between \w*
and [A-Z]
or another \w*
.
现在您的regex失败了,因为\b永远不能匹配。它只匹配字母数字字符和非字母数字字符;因此,它永远不能在\w*和[A-Z]或另一个\w*之间匹配。
So, you need some other (=non-alphanumeric) characters between your words:
因此,你需要在你的单词之间添加一些其他(非字母数字)字符:
Try
试一试
(\w*)\W+([A-Z][a-z]\w*)\W+(\w*)
although (if your regex engine allows using Unicode properties), you might be happier with
虽然(如果您的regex引擎允许使用Unicode属性),但是您可能更喜欢使用它
(\w*)\W+(\p{Lu}\p{Ll}\w*)\W+(\w*)
As written, only capitalized words of length 2 or more are matched, i. e. "I" (as in "me") will not be matched by this. I suppose you inserted the [a-z]
to avoid matches like "IBM"? Or what was your intention?
如所写,只有大写的长度为2或2以上的词才能匹配,即。“我”(和“我”一样)不会被这个匹配。我想你插入[a-z]以避免像“IBM”这样的匹配?或者你的意图是什么?