将For循环的值插入熊猫列

时间:2020-11-29 21:24:05

I am trying to take the results I get from regex, i.e.

我正在尝试从regex获得的结果,例如。

['11J']
['4C']
['5,']
[]
['04 ', '05 ', '48T']

And store those values in a new column (i.e. Apt) of an existing pandas data frame.

并将这些值存储在现有熊猫数据框的新列(即Apt)中。

Sample data (Excel file)

示例数据(Excel文件)

index  id           apt     address           job description
0     122092476     207     EAST 74 STREET    blah blah 11J blah               
1     122096043     2092    8TH AVENUE        blah 4C blah blah

Code

代码

import pandas as pd
import re

df = pd.read_excel('/Users/abc/Desktop/Apartment.xlsx', sheetname=0)
df['Apt'] = 'None'
top5 = df.head()
t5jobs = top5['Job Description']    

d = []

for index, job in enumerate(t5jobs):
    result = re.findall(r'\d\d\D', job) or re.findall(r'\d\D', job) or re.findall(r'PH\D', job)

#print(str(result))
d.append(str(result))

df2 = pd.DataFrame([[d]], columns=list('Apt'))
df.append(df2)

I am getting this error:

我得到了这个错误:

AssertionError: 3 columns passed, passed data had 1 columns

How can I get these values inserted in the Apt column (overwrite None)?

如何将这些值插入Apt列(覆盖None)?

Desired Output:

期望的输出:

index  id           apt     address           job description         apt 
 0     122092476     207     EAST 74 STREET    blah blah 11J blah      11J         
 1     122096043     2092    8TH AVENUE        blah 4C blah blah        4C

1 个解决方案

#1


2  

try this (for pandas 0.18.0+):

试试这个(熊猫0.18.0+):

In [11]: df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True)

In [12]: df
Out[12]:
              id   apt         address     job description  Apt
index
0      122092476   207  EAST 74 STREET  blah blah 11J blah  11J
1      122096043  2092      8TH AVENUE   blah 4C blah blah   4C

for pandas versions < 0.18.0:

熊猫版本< 0.18.0:

df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b')

#1


2  

try this (for pandas 0.18.0+):

试试这个(熊猫0.18.0+):

In [11]: df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b', expand=True)

In [12]: df
Out[12]:
              id   apt         address     job description  Apt
index
0      122092476   207  EAST 74 STREET  blah blah 11J blah  11J
1      122096043  2092      8TH AVENUE   blah 4C blah blah   4C

for pandas versions < 0.18.0:

熊猫版本< 0.18.0:

df['Apt'] = df['job description'].str.extract(r'\b(\d{1,2}\D)\b')