I have the following Python code (I want the first match of a specific number in a text field):
我有以下Python代码(我想要文本字段中特定数字的第一个匹配):
import numpy as np
import pandas
data = {'A': [1, 2, 3], 'B': ['bla 4044 bla', 'bla 5022 bla', 'bla 6045 bla']}
df = pandas.DataFrame(data)
def fun_subjectnr(column):
column = str(column)
subjectnr = re.search(r"(\b[4][0-1][0-9][0-9]\b)",column)
subjectnr1 = re.search(r"(\b[2-3|6-8][0-9][0-9][0-5]\b)",column)
subjectnr = np.where(subjectnr == "" and subjectnr1 != "", subjectnr1,
subjectnr)
return subjectnr1
df['C'] = df['B'].apply(fun_subjectnr)
Wanted output:
想要的输出:
A B C
1 bla 4044 bla 4044
2 bla 5022 bla None
3 bla 6045 bla 6045
It doesn't seem to work. When I add a [0] to the regex code, it gives an error...(subjectnr = re.search(r"(\b[4][0-1][0-9][0-9]\b)",column)[0])
这似乎行不通。当我在regex代码中添加[0]时,它会给出一个错误…(subjectnr = re.search(r“b(\[4][0 - 9][0 - 9][0 - 1]\ b)”,列)[0])
Who knows what to do? Thanks in advance!
谁知道该怎么办?提前谢谢!
1 个解决方案
#1
2
You can do this with str.extract
. You can also condense your pattern a bit, as I show below.
你可以用str.extract来做这个。您还可以稍微压缩您的模式,如下所示。
p = r'\b(4[0-1]\d{2}|(?:[2-3]|[6-8])\d{2}[0-5])\b'
df['C'] = df.B.str.extract(p, expand=False)
df
A B C
0 1 bla 4044 bla 4044
1 2 bla 5022 bla NaN
2 3 bla 6045 bla 6045
This should be much faster than calling apply
.
这应该比调用应用程序快得多。
Details
细节
\b # word boundary
( # first capture group
4 # match digit 4
[0-1] # match 0 or 1
\d{2} # match any two digits
|
(?: # non-capture group (prevent ambiguity during matching)
[2-3] # 2 or 3
| # regex OR metacharacter
[6-8] # 6, 7, or 8
)
\d{2} # any two digits
[0-5] # any digit b/w 0 and 5
)
\b
#1
2
You can do this with str.extract
. You can also condense your pattern a bit, as I show below.
你可以用str.extract来做这个。您还可以稍微压缩您的模式,如下所示。
p = r'\b(4[0-1]\d{2}|(?:[2-3]|[6-8])\d{2}[0-5])\b'
df['C'] = df.B.str.extract(p, expand=False)
df
A B C
0 1 bla 4044 bla 4044
1 2 bla 5022 bla NaN
2 3 bla 6045 bla 6045
This should be much faster than calling apply
.
这应该比调用应用程序快得多。
Details
细节
\b # word boundary
( # first capture group
4 # match digit 4
[0-1] # match 0 or 1
\d{2} # match any two digits
|
(?: # non-capture group (prevent ambiguity during matching)
[2-3] # 2 or 3
| # regex OR metacharacter
[6-8] # 6, 7, or 8
)
\d{2} # any two digits
[0-5] # any digit b/w 0 and 5
)
\b