I'm having a bit of an issue with grep that I can't seem to figure out. I'm trying to search for all instances of lower case words enclosed in double quotes (C strings) in a set of source files. Using bash and gnu grep:
我和grep之间有个问题,我好像搞不清楚。我正在搜索源文件中包含在双引号(C字符串)中的所有小写单词的实例。使用bash和gnu grep:
grep -e '"[a-z]+"' *.cpp
gives me no matches, while
不给我火柴
grep -e '"[a-z]*"' *.cpp
gives me matches like "Abc" which is not just lower case characters. What is the proper regular expression to match only "abc"?
给我像“Abc”这样的匹配,它不只是小写字符。只有“abc”匹配的正确正则表达式是什么?
4 个解决方案
#1
8
You're forgetting to escape the meta characters.
你忘记转义元字符了。
grep -e '"[a-z]\+"'
For the second part, the reason it is matching multi-case characters is because of your locale. As follows:
对于第二部分,它匹配多大小写字符的原因是您的语言环境。如下:
$ echo '"Abc"' | grep -e '"[a-z]\+"'
"Abc"
$ export LC_ALL=C
$ echo '"Abc"' | grep -e '"[a-z]\+"'
$
To get the "ascii-like" behavior, you need to set your locale to "C", as specified in the grep man page:
要获得“类似于ascii的”行为,需要将语言环境设置为“C”,如grep man页面中所述:
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.
在方括号表达式中,范围表达式由两个由连字符分隔的字符组成。它使用语言环境的排序序列和字符集匹配两个字符之间的任何单个字符(包括),例如,在默认的C语言环境中,[a-d]等同于[abcd]。许多地区按字典顺序对字符进行排序,在这些地区[a-d]通常不等同于[abcd];例如,它可能等价于[aBbCcDd]。要获得对方括号表达式的传统解释,您可以通过将LC_ALL环境变量设置为值C来使用C语言环境。
#2
1
Mask the +
掩盖了+
grep -e '"[a-z]\+"' *.cpp
or use egrep:
或者使用egrep:
egrep '"[a-z]+"' *.cpp
maybe you had -E in mind:
也许你心里有个-E:
grep -E '"[a-z]+"' *.cpp
The lowercase -e is used, for example, to specify multiple search patterns.
例如,使用小写的-e来指定多个搜索模式。
The phaenomenon of uppercase characters might origin from your locale - which you can prevent with:
大写字符的phaenomenon可能源自您的语言环境——您可以通过以下方式防止:
LC_ALL=C egrep '"[a-z]+"' *.cpp
#3
0
You probably need to escape the +
:
你可能需要逃离+:
grep -e '"[a-z]\+"' *.cpp
#4
0
If you don't want to mess about with locales, this worked for me:
如果你不想弄乱本地环境,这对我很有效:
grep -e '"[[:lower:]]\+"'
#1
8
You're forgetting to escape the meta characters.
你忘记转义元字符了。
grep -e '"[a-z]\+"'
For the second part, the reason it is matching multi-case characters is because of your locale. As follows:
对于第二部分,它匹配多大小写字符的原因是您的语言环境。如下:
$ echo '"Abc"' | grep -e '"[a-z]\+"'
"Abc"
$ export LC_ALL=C
$ echo '"Abc"' | grep -e '"[a-z]\+"'
$
To get the "ascii-like" behavior, you need to set your locale to "C", as specified in the grep man page:
要获得“类似于ascii的”行为,需要将语言环境设置为“C”,如grep man页面中所述:
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.
在方括号表达式中,范围表达式由两个由连字符分隔的字符组成。它使用语言环境的排序序列和字符集匹配两个字符之间的任何单个字符(包括),例如,在默认的C语言环境中,[a-d]等同于[abcd]。许多地区按字典顺序对字符进行排序,在这些地区[a-d]通常不等同于[abcd];例如,它可能等价于[aBbCcDd]。要获得对方括号表达式的传统解释,您可以通过将LC_ALL环境变量设置为值C来使用C语言环境。
#2
1
Mask the +
掩盖了+
grep -e '"[a-z]\+"' *.cpp
or use egrep:
或者使用egrep:
egrep '"[a-z]+"' *.cpp
maybe you had -E in mind:
也许你心里有个-E:
grep -E '"[a-z]+"' *.cpp
The lowercase -e is used, for example, to specify multiple search patterns.
例如,使用小写的-e来指定多个搜索模式。
The phaenomenon of uppercase characters might origin from your locale - which you can prevent with:
大写字符的phaenomenon可能源自您的语言环境——您可以通过以下方式防止:
LC_ALL=C egrep '"[a-z]+"' *.cpp
#3
0
You probably need to escape the +
:
你可能需要逃离+:
grep -e '"[a-z]\+"' *.cpp
#4
0
If you don't want to mess about with locales, this worked for me:
如果你不想弄乱本地环境,这对我很有效:
grep -e '"[[:lower:]]\+"'