I am quite new to python and regex (regex newbie here), and I have the following simple string:
我对python和regex非常陌生(这里是regex新手),我有以下简单的字符串:
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
I would like to extract only the last digits in the above string i.e 767980716 and I was wondering how I could achieve this using python regex.
我只想提取上面字符串I中的最后一个数字。我想知道如何使用python regex实现这一点。
I wanted to do something similar along the lines of:
我想做一些类似的事情:
re.compile(r"""-(.*?)""").search(str(s)).group(1)
indicating that I want to find the stuff in between (.*?) which starts with a "-" and ends at the end of string - but this returns nothing..
表示我想要在(.*?)之间找到的东西(.*?),它以“-”开头,在字符串末尾结束,但这没有返回任何值。
I was wondering if anyone could point me in the right direction.. Thanks.
我想知道是否有人能给我指明正确的方向。谢谢。
6 个解决方案
#1
28
You can use re.match
to find only the characters:
您可以使用re.match只查找字符:
>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'
Alternatively, re.finditer
works just as well:
或者,re.finditer也同样有效:
>>> next(re.finditer(r'\d+$', s)).group(0)
'767980716'
Explanation of all regexp components:
所有regexp组件的解释:
-
.*?
is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit). - . * ?是一个非贪婪匹配,并且只消耗尽可能多的数据(除了最后一个数字外,贪婪匹配将消耗所有数据)。
-
[0-9]
and\d
are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨. - [0-9]和\d是捕捉数字的两种不同方式。注意,后者还在其他写作计划匹配数字,比如୪或൨。
- Parentheses (
()
) make the content of the expression a group, which can be retrieved withgroup(1)
(or 2 for the second group, 0 for the whole match). - 括号(())使表达式的内容成为一个组,可以用group(1)(或第二组的2,整个匹配的0)来检索。
-
+
means multiple entries (at least one number at the end). - +表示多个条目(最后至少一个数字)。
-
$
matches only the end of the input. - $只匹配输入的末尾。
#2
7
Nice and simple with findall
:
findall简单易用:
import re
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
print re.findall('^.*-([0-9]+)$',s)
>>> ['767980716']
Regex Explanation:
正则表达式的解释:
^ # Match the start of the string
.* # Followed by anthing
- # Upto the last hyphen
([0-9]+) # Capture the digits after the hyphen
$ # Upto the end of the string
Or more simply just match the digits followed at the end of the string '([0-9]+)$'
或者更简单地匹配字符串末尾的数字'([0-9]+)$'
#3
5
Your Regex
should be (\d+)$
.
您的Regex应该是(\d+)$。
-
\d+
is used to match digit (one or more) - \d+用于匹配数字(一个或多个)
-
$
is used to match at the end of string. - $用于匹配字符串的末尾。
So, your code should be: -
所以,您的代码应该是:-
>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(\d+)$').search(s).group(1)
'767980716'
And you don't need to use str
function here, as s
is already a string.
这里不需要使用str函数,因为s已经是一个字符串。
#4
3
Use the below regex
使用下面的正则表达式
\d+$
$
depicts the end of string..
$描述字符串的结尾。
\d
is a digit
\ d是一个数字
+
matches the preceding character 1 to many times
+匹配前面的字符1到多次
#5
2
Try using \d+$
instead. That matches one or more numeric characters followed by the end of the string.
尝试使用\ d +而不是美元。匹配一个或多个数字字符,后跟字符串的末尾。
#6
2
Save the regular expressions for something that requires more heavy lifting.
将正则表达式保存为需要更繁重的工作。
>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'
#1
28
You can use re.match
to find only the characters:
您可以使用re.match只查找字符:
>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'
Alternatively, re.finditer
works just as well:
或者,re.finditer也同样有效:
>>> next(re.finditer(r'\d+$', s)).group(0)
'767980716'
Explanation of all regexp components:
所有regexp组件的解释:
-
.*?
is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit). - . * ?是一个非贪婪匹配,并且只消耗尽可能多的数据(除了最后一个数字外,贪婪匹配将消耗所有数据)。
-
[0-9]
and\d
are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨. - [0-9]和\d是捕捉数字的两种不同方式。注意,后者还在其他写作计划匹配数字,比如୪或൨。
- Parentheses (
()
) make the content of the expression a group, which can be retrieved withgroup(1)
(or 2 for the second group, 0 for the whole match). - 括号(())使表达式的内容成为一个组,可以用group(1)(或第二组的2,整个匹配的0)来检索。
-
+
means multiple entries (at least one number at the end). - +表示多个条目(最后至少一个数字)。
-
$
matches only the end of the input. - $只匹配输入的末尾。
#2
7
Nice and simple with findall
:
findall简单易用:
import re
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
print re.findall('^.*-([0-9]+)$',s)
>>> ['767980716']
Regex Explanation:
正则表达式的解释:
^ # Match the start of the string
.* # Followed by anthing
- # Upto the last hyphen
([0-9]+) # Capture the digits after the hyphen
$ # Upto the end of the string
Or more simply just match the digits followed at the end of the string '([0-9]+)$'
或者更简单地匹配字符串末尾的数字'([0-9]+)$'
#3
5
Your Regex
should be (\d+)$
.
您的Regex应该是(\d+)$。
-
\d+
is used to match digit (one or more) - \d+用于匹配数字(一个或多个)
-
$
is used to match at the end of string. - $用于匹配字符串的末尾。
So, your code should be: -
所以,您的代码应该是:-
>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(\d+)$').search(s).group(1)
'767980716'
And you don't need to use str
function here, as s
is already a string.
这里不需要使用str函数,因为s已经是一个字符串。
#4
3
Use the below regex
使用下面的正则表达式
\d+$
$
depicts the end of string..
$描述字符串的结尾。
\d
is a digit
\ d是一个数字
+
matches the preceding character 1 to many times
+匹配前面的字符1到多次
#5
2
Try using \d+$
instead. That matches one or more numeric characters followed by the end of the string.
尝试使用\ d +而不是美元。匹配一个或多个数字字符,后跟字符串的末尾。
#6
2
Save the regular expressions for something that requires more heavy lifting.
将正则表达式保存为需要更繁重的工作。
>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'