为什么这个正则表达式不起作用?

时间:2021-01-12 21:45:22

I have a regular expression to extract two tokens, delimiters ['] and words between apostrophes like 'Stack Overflow'. The question is, why this regular expression doesn't work?

我有一个正则表达式来提取两个标记,分隔符[']和撇号之间的单词,如'Stack Overflow'。问题是,为什么这个正则表达式不起作用?

Regex:

正则表达式:

(['])|'([^']*)'

Here is a link to explain it: Regular Expression

这是一个解释它的链接:正则表达式

Only works extracting apostrophes but, words between apostrophes no.

只能用于提取撇号但是,撇号之间没有。

NOTE: I need to extract apostrophe and any word between apostrophe by separately like 'Stack Overflow'.

注意:我需要像'Stack Overflow'一样单独提取撇号和撇号之间的任何单词。

The result would be like:

结果如下:

  1. '
  2. Stack Overflow
  3. 堆栈溢出
  4. '

Greetings.

问候。

2 个解决方案

#1


5  

Your regex says to match either a single quote or the content between quotes, but it's an exclusive or the way you have it. To get each of them as a capture group you could use the regex:

你的正则表达式说要匹配单引号或引号之间的内容,但它是独占的或你拥有它的方式。要将它们作为捕获组,您可以使用正则表达式:

(')([^']*)(')

to get the first quote, then everything that's not a quote then the last quote

得到第一个报价,那么所有不是报价然后是最后一个报价

#2


4  

TL;DR Because it's short-circuit.

TL; DR因为它是短路的。

In the or condition, once the first regex is matched the second regex is unnecessary to evaluated. because True | anything always gets True, right?

在条件中,一旦第一个正则表达式匹配,就不需要计算第二个正则表达式。因为True |什么都变得正确,对吗?

Consider your regex

考虑你的正则表达式

regex = (['])|'([^']*)'
text = 'Stack Overflow'

Run regex to match string in text

运行正则表达式以匹配文本中的字符串

([']) matches to ' and ', then capture them into $1 and $2.

(['])匹配'和',然后将它们捕获到$ 1和$ 2。

done! (skip the second regex because you connect them with or)

完成了! (跳过第二个正则表达式,因为你用它们连接或)

Another proof:

另一个证据:

regex = (['])|'([^']*)'
text = 'Stack Overflow'

get

得到

$1 = `'`
$2 = `'`

but

regex = '([^']*)'|(['])
text = 'Stack Overflow'

get

得到

$1 = `Stack Overflow`

You will see that only the first one is work!

你会看到只有第一个工作!

Thus, I suggest you to use this regex instead of:

因此,我建议你使用这个正则表达式而不是:

(')(.*?)(')

where you get your captured texts in $1, $2, $3 respectively.

您可以分别以1美元,2美元,3美元的价格获得所捕获的文本。

Note that *? is a non-greedy quantifier, the simple explanation is: it will not arbitrarily consume your '.

注意 *?是一个非贪婪的量词,简单的解释是:它不会随意消耗你的'。

#1


5  

Your regex says to match either a single quote or the content between quotes, but it's an exclusive or the way you have it. To get each of them as a capture group you could use the regex:

你的正则表达式说要匹配单引号或引号之间的内容,但它是独占的或你拥有它的方式。要将它们作为捕获组,您可以使用正则表达式:

(')([^']*)(')

to get the first quote, then everything that's not a quote then the last quote

得到第一个报价,那么所有不是报价然后是最后一个报价

#2


4  

TL;DR Because it's short-circuit.

TL; DR因为它是短路的。

In the or condition, once the first regex is matched the second regex is unnecessary to evaluated. because True | anything always gets True, right?

在条件中,一旦第一个正则表达式匹配,就不需要计算第二个正则表达式。因为True |什么都变得正确,对吗?

Consider your regex

考虑你的正则表达式

regex = (['])|'([^']*)'
text = 'Stack Overflow'

Run regex to match string in text

运行正则表达式以匹配文本中的字符串

([']) matches to ' and ', then capture them into $1 and $2.

(['])匹配'和',然后将它们捕获到$ 1和$ 2。

done! (skip the second regex because you connect them with or)

完成了! (跳过第二个正则表达式,因为你用它们连接或)

Another proof:

另一个证据:

regex = (['])|'([^']*)'
text = 'Stack Overflow'

get

得到

$1 = `'`
$2 = `'`

but

regex = '([^']*)'|(['])
text = 'Stack Overflow'

get

得到

$1 = `Stack Overflow`

You will see that only the first one is work!

你会看到只有第一个工作!

Thus, I suggest you to use this regex instead of:

因此,我建议你使用这个正则表达式而不是:

(')(.*?)(')

where you get your captured texts in $1, $2, $3 respectively.

您可以分别以1美元,2美元,3美元的价格获得所捕获的文本。

Note that *? is a non-greedy quantifier, the simple explanation is: it will not arbitrarily consume your '.

注意 *?是一个非贪婪的量词,简单的解释是:它不会随意消耗你的'。