
时间:2020-12-10 08:56:55

I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this:


translate('hello world');

The other params are optional, but could be included like this:


translate('hello world', true, 1, 'foobar', 'etc');

翻译('hello world',true,1,'foobar','etc');

And the string itself could contain escaped single quotes, like this:


translate('hello\'s world');

To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect:


grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" .

grep -RoPh“(?<= translate \(')。*?(?='\)|'\,)”。

The problem with this though, is that if the call is something like this:


translate('hello \'world\', you\'re great!');


My grep would only return this:


hello \'world\

So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \


Hopefully I'm making sense. Any suggestions please?


2 个解决方案



You can use this grep with PCRE regex:


grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .

Here is a regex demo


RegEx Breakup:

\b            # word boundary
translate     # match literal translate
\(            # match a (
\s*           # match 0 or more whitespace
\K            # reset the matched information
'             # match starting single quote
(?:           # start non-capturing group
   [^'\\\\]*  # match 0 or more chars that are not a backslash or single quote
)             # end non-capturing group
(?:           # start non-capturing group
   \\\\.      # match a backslash followed by char that is "escaped"
   [^'\\\\]*  # match 0 or more chars that are not a backslash or single quote
)*            # end non-capturing group
'             # match ending single quote

Here is a version without \K using look-arounds:

这是一个没有\ K使用环视的版本:

grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .

RegEx Demo 2




I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern. In these cases, I like to use something like the following reasoning:

我认为问题是。*?部分:?使它成为一种非贪婪的模式,这意味着它将采用与模式匹配的最短字符串。实际上,你说,“给我最短的字符串,然后引用+ close-paren或quote +逗号”。在您的示例中,“world”后跟单引号和逗号,因此它与您的模式匹配。在这些情况下,我喜欢使用以下推理:

A string is a quote, zero or more characters, and a quote: '.*'


A character is anything that isn't a quote (because a quote terminates the string): '[^']*'

字符是任何不是引号的字符(因为引号终止字符串):'[^'] *'

Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'

除非您可以通过使用反斜杠转义它来将字符串放入字符串中,因此字符要么是“反斜杠后跟引号”,要么失败,“不是引用”:'(\\'| [^'] )*”

Put it all together and you get


grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .



You can use this grep with PCRE regex:


grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .

Here is a regex demo


RegEx Breakup:

\b            # word boundary
translate     # match literal translate
\(            # match a (
\s*           # match 0 or more whitespace
\K            # reset the matched information
'             # match starting single quote
(?:           # start non-capturing group
   [^'\\\\]*  # match 0 or more chars that are not a backslash or single quote
)             # end non-capturing group
(?:           # start non-capturing group
   \\\\.      # match a backslash followed by char that is "escaped"
   [^'\\\\]*  # match 0 or more chars that are not a backslash or single quote
)*            # end non-capturing group
'             # match ending single quote

Here is a version without \K using look-arounds:

这是一个没有\ K使用环视的版本:

grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .

RegEx Demo 2




I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern. In these cases, I like to use something like the following reasoning:

我认为问题是。*?部分:?使它成为一种非贪婪的模式,这意味着它将采用与模式匹配的最短字符串。实际上,你说,“给我最短的字符串,然后引用+ close-paren或quote +逗号”。在您的示例中,“world”后跟单引号和逗号,因此它与您的模式匹配。在这些情况下,我喜欢使用以下推理:

A string is a quote, zero or more characters, and a quote: '.*'


A character is anything that isn't a quote (because a quote terminates the string): '[^']*'

字符是任何不是引号的字符(因为引号终止字符串):'[^'] *'

Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'

除非您可以通过使用反斜杠转义它来将字符串放入字符串中,因此字符要么是“反斜杠后跟引号”,要么失败,“不是引用”:'(\\'| [^'] )*”

Put it all together and you get


grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .