python regex:从字符串中获取结束数字

时间:2020-12-15 19:29:26

I am quite new to python and regex (regex newbie here), and I have the following simple string:

我对python和regex非常陌生(这里是regex新手),我有以下简单的字符串:

s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""

I would like to extract only the last digits in the above string i.e 767980716 and I was wondering how I could achieve this using python regex.

我只想提取上面字符串I中的最后一个数字。我想知道如何使用python regex实现这一点。

I wanted to do something similar along the lines of:

我想做一些类似的事情:

re.compile(r"""-(.*?)""").search(str(s)).group(1)

indicating that I want to find the stuff in between (.*?) which starts with a "-" and ends at the end of string - but this returns nothing..

表示我想要在(.*?)之间找到的东西(.*?),它以“-”开头,在字符串末尾结束,但这没有返回任何值。

I was wondering if anyone could point me in the right direction.. Thanks.

我想知道是否有人能给我指明正确的方向。谢谢。

6 个解决方案

#1


28  

You can use re.match to find only the characters:

您可以使用re.match只查找字符:

>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'

Alternatively, re.finditer works just as well:

或者,re.finditer也同样有效:

>>> next(re.finditer(r'\d+$', s)).group(0)
'767980716'

Explanation of all regexp components:

所有regexp组件的解释:

  • .*? is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit).
  • . * ?是一个非贪婪匹配,并且只消耗尽可能多的数据(除了最后一个数字外,贪婪匹配将消耗所有数据)。
  • [0-9] and \d are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨.
  • [0-9]和\d是捕捉数字的两种不同方式。注意,后者还在其他写作计划匹配数字,比如୪或൨。
  • Parentheses (()) make the content of the expression a group, which can be retrieved with group(1) (or 2 for the second group, 0 for the whole match).
  • 括号(())使表达式的内容成为一个组,可以用group(1)(或第二组的2,整个匹配的0)来检索。
  • + means multiple entries (at least one number at the end).
  • +表示多个条目(最后至少一个数字)。
  • $ matches only the end of the input.
  • $只匹配输入的末尾。

#2


7  

Nice and simple with findall:

findall简单易用:

import re

s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""

print re.findall('^.*-([0-9]+)$',s)

>>> ['767980716']

Regex Explanation:

正则表达式的解释:

^         # Match the start of the string
.*        # Followed by anthing
-         # Upto the last hyphen
([0-9]+)  # Capture the digits after the hyphen
$         # Upto the end of the string

Or more simply just match the digits followed at the end of the string '([0-9]+)$'

或者更简单地匹配字符串末尾的数字'([0-9]+)$'

#3


5  

Your Regex should be (\d+)$.

您的Regex应该是(\d+)$。

  • \d+ is used to match digit (one or more)
  • \d+用于匹配数字(一个或多个)
  • $ is used to match at the end of string.
  • $用于匹配字符串的末尾。

So, your code should be: -

所以,您的代码应该是:-

>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(\d+)$').search(s).group(1)
'767980716'

And you don't need to use str function here, as s is already a string.

这里不需要使用str函数,因为s已经是一个字符串。

#4


3  

Use the below regex

使用下面的正则表达式

\d+$

$ depicts the end of string..

$描述字符串的结尾。

\d is a digit

\ d是一个数字

+ matches the preceding character 1 to many times

+匹配前面的字符1到多次

#5


2  

Try using \d+$ instead. That matches one or more numeric characters followed by the end of the string.

尝试使用\ d +而不是美元。匹配一个或多个数字字符,后跟字符串的末尾。

#6


2  

Save the regular expressions for something that requires more heavy lifting.

将正则表达式保存为需要更繁重的工作。

>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'

#1


28  

You can use re.match to find only the characters:

您可以使用re.match只查找字符:

>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'

Alternatively, re.finditer works just as well:

或者,re.finditer也同样有效:

>>> next(re.finditer(r'\d+$', s)).group(0)
'767980716'

Explanation of all regexp components:

所有regexp组件的解释:

  • .*? is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit).
  • . * ?是一个非贪婪匹配,并且只消耗尽可能多的数据(除了最后一个数字外,贪婪匹配将消耗所有数据)。
  • [0-9] and \d are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨.
  • [0-9]和\d是捕捉数字的两种不同方式。注意,后者还在其他写作计划匹配数字,比如୪或൨。
  • Parentheses (()) make the content of the expression a group, which can be retrieved with group(1) (or 2 for the second group, 0 for the whole match).
  • 括号(())使表达式的内容成为一个组,可以用group(1)(或第二组的2,整个匹配的0)来检索。
  • + means multiple entries (at least one number at the end).
  • +表示多个条目(最后至少一个数字)。
  • $ matches only the end of the input.
  • $只匹配输入的末尾。

#2


7  

Nice and simple with findall:

findall简单易用:

import re

s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""

print re.findall('^.*-([0-9]+)$',s)

>>> ['767980716']

Regex Explanation:

正则表达式的解释:

^         # Match the start of the string
.*        # Followed by anthing
-         # Upto the last hyphen
([0-9]+)  # Capture the digits after the hyphen
$         # Upto the end of the string

Or more simply just match the digits followed at the end of the string '([0-9]+)$'

或者更简单地匹配字符串末尾的数字'([0-9]+)$'

#3


5  

Your Regex should be (\d+)$.

您的Regex应该是(\d+)$。

  • \d+ is used to match digit (one or more)
  • \d+用于匹配数字(一个或多个)
  • $ is used to match at the end of string.
  • $用于匹配字符串的末尾。

So, your code should be: -

所以,您的代码应该是:-

>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(\d+)$').search(s).group(1)
'767980716'

And you don't need to use str function here, as s is already a string.

这里不需要使用str函数,因为s已经是一个字符串。

#4


3  

Use the below regex

使用下面的正则表达式

\d+$

$ depicts the end of string..

$描述字符串的结尾。

\d is a digit

\ d是一个数字

+ matches the preceding character 1 to many times

+匹配前面的字符1到多次

#5


2  

Try using \d+$ instead. That matches one or more numeric characters followed by the end of the string.

尝试使用\ d +而不是美元。匹配一个或多个数字字符,后跟字符串的末尾。

#6


2  

Save the regular expressions for something that requires more heavy lifting.

将正则表达式保存为需要更繁重的工作。

>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'