I am trying to take the results I get from regex, i.e.
我正在尝试从regex获得的结果,例如。
['11J']
['4C']
['5,']
[]
['04 ', '05 ', '48T']
And store those values in a new column (i.e. Apt) of an existing pandas data frame.
并将这些值存储在现有熊猫数据框的新列(即Apt)中。
Sample data (Excel file)
示例数据(Excel文件)
index id apt address job description
0 122092476 207 EAST 74 STREET blah blah 11J blah
1 122096043 2092 8TH AVENUE blah 4C blah blah
Code
代码
import pandas as pd
import re
df = pd.read_excel('/Users/abc/Desktop/Apartment.xlsx', sheetname=0)
df['Apt'] = 'None'
top5 = df.head()
t5jobs = top5['Job Description']
d = []
for index, job in enumerate(t5jobs):
result = re.findall(r'\d\d\D', job) or re.findall(r'\d\D', job) or re.findall(r'PH\D', job)
#print(str(result))
d.append(str(result))
df2 = pd.DataFrame([[d]], columns=list('Apt'))
df.append(df2)
I am getting this error:
我得到了这个错误:
AssertionError: 3 columns passed, passed data had 1 columns
How can I get these values inserted in the Apt column (overwrite None)?
如何将这些值插入Apt列(覆盖None)?
Desired Output:
期望的输出:
index id apt address job description apt
0 122092476 207 EAST 74 STREET blah blah 11J blah 11J
1 122096043 2092 8TH AVENUE blah 4C blah blah 4C
1 个解决方案
#1
2
try this (for pandas 0.18.0+):
试试这个(熊猫0.18.0+):
In [11]: df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True)
In [12]: df
Out[12]:
id apt address job description Apt
index
0 122092476 207 EAST 74 STREET blah blah 11J blah 11J
1 122096043 2092 8TH AVENUE blah 4C blah blah 4C
for pandas versions < 0.18.0:
熊猫版本< 0.18.0:
df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b')
#1
2
try this (for pandas 0.18.0+):
试试这个(熊猫0.18.0+):
In [11]: df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True)
In [12]: df
Out[12]:
id apt address job description Apt
index
0 122092476 207 EAST 74 STREET blah blah 11J blah 11J
1 122096043 2092 8TH AVENUE blah 4C blah blah 4C
for pandas versions < 0.18.0:
熊猫版本< 0.18.0:
df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b')