I have a regular expression to extract two tokens, delimiters ['] and words between apostrophes like 'Stack Overflow'. The question is, why this regular expression doesn't work?
我有一个正则表达式来提取两个标记,分隔符[']和撇号之间的单词,如'Stack Overflow'。问题是,为什么这个正则表达式不起作用?
Regex:
正则表达式:
(['])|'([^']*)'
Here is a link to explain it: Regular Expression
这是一个解释它的链接:正则表达式
Only works extracting apostrophes but, words between apostrophes no.
只能用于提取撇号但是,撇号之间没有。
NOTE: I need to extract apostrophe and any word between apostrophe by separately like 'Stack Overflow'.
注意:我需要像'Stack Overflow'一样单独提取撇号和撇号之间的任何单词。
The result would be like:
结果如下:
- '
- “
- Stack Overflow
- 堆栈溢出
- '
- “
Greetings.
问候。
2 个解决方案
#1
5
Your regex says to match either a single quote or the content between quotes, but it's an exclusive or the way you have it. To get each of them as a capture group you could use the regex:
你的正则表达式说要匹配单引号或引号之间的内容,但它是独占的或你拥有它的方式。要将它们作为捕获组,您可以使用正则表达式:
(')([^']*)(')
to get the first quote, then everything that's not a quote then the last quote
得到第一个报价,那么所有不是报价然后是最后一个报价
#2
4
TL;DR Because it's short-circuit.
TL; DR因为它是短路的。
In the or
condition, once the first regex is matched the second regex is unnecessary to evaluated. because True | anything
always gets True
, right?
在条件中,一旦第一个正则表达式匹配,就不需要计算第二个正则表达式。因为True |什么都变得正确,对吗?
Consider your regex
考虑你的正则表达式
regex = (['])|'([^']*)'
text = 'Stack Overflow'
Run regex to match string in text
运行正则表达式以匹配文本中的字符串
(['])
matches to '
and '
, then capture them into $1
and $2
.
(['])匹配'和',然后将它们捕获到$ 1和$ 2。
done! (skip the second regex because you connect them with or
)
完成了! (跳过第二个正则表达式,因为你用它们连接或)
Another proof:
另一个证据:
regex = (['])|'([^']*)'
text = 'Stack Overflow'
get
得到
$1 = `'`
$2 = `'`
but
但
regex = '([^']*)'|(['])
text = 'Stack Overflow'
get
得到
$1 = `Stack Overflow`
You will see that only the first one is work!
你会看到只有第一个工作!
Thus, I suggest you to use this regex instead of:
因此,我建议你使用这个正则表达式而不是:
(')(.*?)(')
where you get your captured texts in $1
, $2
, $3
respectively.
您可以分别以1美元,2美元,3美元的价格获得所捕获的文本。
Note that *?
is a non-greedy quantifier, the simple explanation is: it will not arbitrarily consume your '
.
注意 *?是一个非贪婪的量词,简单的解释是:它不会随意消耗你的'。
#1
5
Your regex says to match either a single quote or the content between quotes, but it's an exclusive or the way you have it. To get each of them as a capture group you could use the regex:
你的正则表达式说要匹配单引号或引号之间的内容,但它是独占的或你拥有它的方式。要将它们作为捕获组,您可以使用正则表达式:
(')([^']*)(')
to get the first quote, then everything that's not a quote then the last quote
得到第一个报价,那么所有不是报价然后是最后一个报价
#2
4
TL;DR Because it's short-circuit.
TL; DR因为它是短路的。
In the or
condition, once the first regex is matched the second regex is unnecessary to evaluated. because True | anything
always gets True
, right?
在条件中,一旦第一个正则表达式匹配,就不需要计算第二个正则表达式。因为True |什么都变得正确,对吗?
Consider your regex
考虑你的正则表达式
regex = (['])|'([^']*)'
text = 'Stack Overflow'
Run regex to match string in text
运行正则表达式以匹配文本中的字符串
(['])
matches to '
and '
, then capture them into $1
and $2
.
(['])匹配'和',然后将它们捕获到$ 1和$ 2。
done! (skip the second regex because you connect them with or
)
完成了! (跳过第二个正则表达式,因为你用它们连接或)
Another proof:
另一个证据:
regex = (['])|'([^']*)'
text = 'Stack Overflow'
get
得到
$1 = `'`
$2 = `'`
but
但
regex = '([^']*)'|(['])
text = 'Stack Overflow'
get
得到
$1 = `Stack Overflow`
You will see that only the first one is work!
你会看到只有第一个工作!
Thus, I suggest you to use this regex instead of:
因此,我建议你使用这个正则表达式而不是:
(')(.*?)(')
where you get your captured texts in $1
, $2
, $3
respectively.
您可以分别以1美元,2美元,3美元的价格获得所捕获的文本。
Note that *?
is a non-greedy quantifier, the simple explanation is: it will not arbitrarily consume your '
.
注意 *?是一个非贪婪的量词,简单的解释是:它不会随意消耗你的'。