正则表达式提取部分字符串

时间:2022-09-13 08:35:00

I have a string in the form of

我有一个字符串的形式

Foo
"Foo"
"Some Foo"
"Some Foo and more"

I need to extract the value Foo which is in quotes and can be surrounded by any number of alphanumeric and white space characters. So, for the examples above I would like the output to be

我需要提取引号中的值Foo,并且可以被任意数量的字母数字和空格字符包围。因此,对于上面的示例,我希望输出为

<NoMatch>
Foo
Foo
Foo

I have been trying to get this to work, and this is the pattern I have so far using lookahead/lookbehind for quotes. This works for "Foo" but not others.

我一直试图让这个工作,这是我到目前为止使用lookahead / lookbehind引用的模式。这适用于“Foo”但不适用于其他人。

(?<=")Foo(?=")

Futher expanding this to

进一步将此扩展到

(?<=")(?<=.*?)Timesheet(?=.*?)(?=")

does not work.

不起作用。

Any assistance will be appreciated!

任何帮助将不胜感激!

4 个解决方案

#1


9  

If quotes are correctly balanced and quoted strings don't span multiple lines, then you can simply look ahead in the string to check whether an even number of quotes follows. If that's not true, we know that we're inside a quoted string:

如果引号被正确平衡并且引用的字符串不跨越多行,那么您可以简单地向前看字符串以检查是否跟随偶数引号。如果那不是真的,我们知道我们在一个带引号的字符串里面:

Foo(?![^"\r\n]*(?:"[^"\r\n]*"[^"\r\n]*)*$)

Explanation:

Foo          # Match Foo
(?!          # only if the following can't be matched here:
 [^"\r\n]*   # Any number of characters except quotes or newlines
 (?:         # followed by
  "[^"\r\n]* # (a quote and any number of non-quotes/newlines
  "[^"\r\n]* # twice)
 )*          # any number of times.
 $           # End of the line
)            # End of lookahead assertion

See it live on regex101.com

在regex101.com上查看

#2


1  

Look-around ((?<=something) and (?=something)) don't work on variable-lenght patterns, i.e., on .*. Try this:

环视((?<= something)和(?= something))不适用于变长模式,即on。*。尝试这个:

(?<=")(.*?)(Foo)(.*?)(?=")

and then use match strings (depending on your language: $1,$2,... or \1,\2,... or members of some array or something like that).

然后使用匹配字符串(取决于您的语言:$ 1,$ 2,...或\ 1,\ 2,...或某些数组的成员或类似的东西)。

#3


0  

Try to do something with this kind of pattern:

尝试用这种模式做一些事情:

"[^"]*?Foo[^"]*?"

#4


0  

In Notepad++

search : ("[^"]*)Foo([^"]*")
replace : $1Bar$2

#1


9  

If quotes are correctly balanced and quoted strings don't span multiple lines, then you can simply look ahead in the string to check whether an even number of quotes follows. If that's not true, we know that we're inside a quoted string:

如果引号被正确平衡并且引用的字符串不跨越多行,那么您可以简单地向前看字符串以检查是否跟随偶数引号。如果那不是真的,我们知道我们在一个带引号的字符串里面:

Foo(?![^"\r\n]*(?:"[^"\r\n]*"[^"\r\n]*)*$)

Explanation:

Foo          # Match Foo
(?!          # only if the following can't be matched here:
 [^"\r\n]*   # Any number of characters except quotes or newlines
 (?:         # followed by
  "[^"\r\n]* # (a quote and any number of non-quotes/newlines
  "[^"\r\n]* # twice)
 )*          # any number of times.
 $           # End of the line
)            # End of lookahead assertion

See it live on regex101.com

在regex101.com上查看

#2


1  

Look-around ((?<=something) and (?=something)) don't work on variable-lenght patterns, i.e., on .*. Try this:

环视((?<= something)和(?= something))不适用于变长模式,即on。*。尝试这个:

(?<=")(.*?)(Foo)(.*?)(?=")

and then use match strings (depending on your language: $1,$2,... or \1,\2,... or members of some array or something like that).

然后使用匹配字符串(取决于您的语言:$ 1,$ 2,...或\ 1,\ 2,...或某些数组的成员或类似的东西)。

#3


0  

Try to do something with this kind of pattern:

尝试用这种模式做一些事情:

"[^"]*?Foo[^"]*?"

#4


0  

In Notepad++

search : ("[^"]*)Foo([^"]*")
replace : $1Bar$2