I have a string as below
下面是一个字符串
"Temporada 2015"
“Temporada 2015”
and also I get string as
还有字符串a
"Temporada 8"
“Temporada 8”
I need to match and extract only numbers from the string 2015 and 8. How do i do it using regex. I tried like below
我需要匹配并从2015和8的字符串中提取数字。如何使用regex进行操作。我试着像下面
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*(\d+)/)[2]
But it returned only 5 for first one instead of 2015. How do I match both and return only nos.??
但在2015年之前,它只获得了5分。我如何匹配两个和只返回no。?
5 个解决方案
#1
1
You should add a ?
to make the regex non-greedy:
你应该加上a吗?使regex不贪婪:
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];
Here is a sample program for verification.
这是一个用于验证的示例程序。
#2
2
The .*
is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+
.
. *是“贪婪”。它匹配尽可能多的字符。它只留下一个数字作为\d+。
If your strings are known to contain no other numbers, you can just do
如果已知您的字符串不包含其他数字,您可以这样做
.scan(/\d+/).first
otherwise you can just match non-digit
否则你只能匹配非数字
.match(/(Tempo)[^\d]*(\d+)/)[2]
#3
1
Because .*
is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .*
to non-greedy .*?
, it will do a shortest possible match which inturn give you the last number.
因为。*是贪心的,它尽可能地匹配所有的字符,这样它就会返回前一个字符贪婪匹配的最后一个数字。变成贪婪。*对非贪婪。*?,它会做一个最短的可能匹配,然后依次给你最后一个数字。
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]
#4
1
You can scan directly for digits:
你可以直接扫描数字:
"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]
If you want to include Temp
in regex:
如果你想在regex中包含临时雇员:
"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]
Non regex way:
非正则表达式:
"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"
#5
0
I'd write it thus:
我写:
r = /
\b # match a word-break (possibly beginning of string)
Tempo # match these characters
\D+ # match one or more characters other than digits
\K # forget everything matched so far
\d+ # match one or more digits
/x
"Temporada 2015"[r] #=> 2015
"Temporada 8"[r] #=> 8
"Temporary followed by something else 21 then more"[r]
#=> 21
If 'Tempo' must be at the beginning of the string, write r = /Tempo....
or r = /\s*Tempo...
if it can be preceded by whitespace. I've written \D+
rather than \D*
on the assumption that there should be at least one space.
如果“节奏”必须在字符串的开头,写r = /节奏....或者r = / \ s *节奏……如果前面可以有空格。我写了\D+而不是\D*,假设至少应该有一个空格。
I don't understand why 'Tempo'
is in a capture group. Have I missed something?
我不明白为什么" Tempo "是在一个捕捉组里。我错过了什么吗?
#1
1
You should add a ?
to make the regex non-greedy:
你应该加上a吗?使regex不贪婪:
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];
Here is a sample program for verification.
这是一个用于验证的示例程序。
#2
2
The .*
is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+
.
. *是“贪婪”。它匹配尽可能多的字符。它只留下一个数字作为\d+。
If your strings are known to contain no other numbers, you can just do
如果已知您的字符串不包含其他数字,您可以这样做
.scan(/\d+/).first
otherwise you can just match non-digit
否则你只能匹配非数字
.match(/(Tempo)[^\d]*(\d+)/)[2]
#3
1
Because .*
is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .*
to non-greedy .*?
, it will do a shortest possible match which inturn give you the last number.
因为。*是贪心的,它尽可能地匹配所有的字符,这样它就会返回前一个字符贪婪匹配的最后一个数字。变成贪婪。*对非贪婪。*?,它会做一个最短的可能匹配,然后依次给你最后一个数字。
doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]
#4
1
You can scan directly for digits:
你可以直接扫描数字:
"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]
If you want to include Temp
in regex:
如果你想在regex中包含临时雇员:
"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]
Non regex way:
非正则表达式:
"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"
#5
0
I'd write it thus:
我写:
r = /
\b # match a word-break (possibly beginning of string)
Tempo # match these characters
\D+ # match one or more characters other than digits
\K # forget everything matched so far
\d+ # match one or more digits
/x
"Temporada 2015"[r] #=> 2015
"Temporada 8"[r] #=> 8
"Temporary followed by something else 21 then more"[r]
#=> 21
If 'Tempo' must be at the beginning of the string, write r = /Tempo....
or r = /\s*Tempo...
if it can be preceded by whitespace. I've written \D+
rather than \D*
on the assumption that there should be at least one space.
如果“节奏”必须在字符串的开头,写r = /节奏....或者r = / \ s *节奏……如果前面可以有空格。我写了\D+而不是\D*,假设至少应该有一个空格。
I don't understand why 'Tempo'
is in a capture group. Have I missed something?
我不明白为什么" Tempo "是在一个捕捉组里。我错过了什么吗?