Python pandas将逗号分隔值列表转换为dataframe

时间:2021-09-10 00:21:28

I have a list of strings which looks like this:

我有一个字符串列表,如下所示:

["Name: Alice, Department: HR, Salary: 60000", "Name: Bob, Department: Engineering, Salary: 45000"]

I would like to convert this list into a DataFrame that looks like this:

我想将此列表转换为如下所示的DataFrame:

Name | Department | Salary
--------------------------
Alice | HR | 60000

Bob | Engineering | 45000

What would be the easiest way to go about this? My gut says throw the data into a CSV and separate titles with regex "^.*:", but there must be a simpler way

最简单的方法是什么?我的直觉说将数据放入CSV并用正则表达式“^。*:”分隔标题,但必须有一个更简单的方法

3 个解决方案

#1


8  

With some string processing you can get a list of dicts and pass that to the DataFrame constructor:

通过一些字符串处理,您可以获得一个dicts列表并将其传递给DataFrame构造函数:

lst = ["Name: Alice, Department: HR, Salary: 60000", 
       "Name: Bob, Department: Engineering, Salary: 45000"]
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst])
Out: 
    Department   Name Salary
0           HR  Alice  60000
1  Engineering    Bob  45000

#2


3  

you can do it this way:

你可以这样做:

In [271]: s
Out[271]:
['Name: Alice, Department: HR, Salary: 60000',
 'Name: Bob, Department: Engineering, Salary: 45000']

In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))),
     ...:             names=['Name','Department','Salary'],
     ...:             header=None,
     ...:             lineterminator=r'~'
     ...: )
     ...:
Out[272]:
    Name   Department  Salary
0  Alice           HR   60000
1    Bob  Engineering   45000

#3


3  

a little creative

有点创意

s.str.extractall(r'(?P<key>[^,]+)\s*:(?P<value>[^,]+)') \
    .reset_index('match', drop=True) \
    .set_index('key', append=True).value.unstack()

Python pandas将逗号分隔值列表转换为dataframe

setup

l = ["Name: Alice, Department: HR, Salary: 60000",
     "Name: Bob, Department: Engineering, Salary: 45000"]
s = pd.Series(l)

#1


8  

With some string processing you can get a list of dicts and pass that to the DataFrame constructor:

通过一些字符串处理,您可以获得一个dicts列表并将其传递给DataFrame构造函数:

lst = ["Name: Alice, Department: HR, Salary: 60000", 
       "Name: Bob, Department: Engineering, Salary: 45000"]
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst])
Out: 
    Department   Name Salary
0           HR  Alice  60000
1  Engineering    Bob  45000

#2


3  

you can do it this way:

你可以这样做:

In [271]: s
Out[271]:
['Name: Alice, Department: HR, Salary: 60000',
 'Name: Bob, Department: Engineering, Salary: 45000']

In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))),
     ...:             names=['Name','Department','Salary'],
     ...:             header=None,
     ...:             lineterminator=r'~'
     ...: )
     ...:
Out[272]:
    Name   Department  Salary
0  Alice           HR   60000
1    Bob  Engineering   45000

#3


3  

a little creative

有点创意

s.str.extractall(r'(?P<key>[^,]+)\s*:(?P<value>[^,]+)') \
    .reset_index('match', drop=True) \
    .set_index('key', append=True).value.unstack()

Python pandas将逗号分隔值列表转换为dataframe

setup

l = ["Name: Alice, Department: HR, Salary: 60000",
     "Name: Bob, Department: Engineering, Salary: 45000"]
s = pd.Series(l)