I have a list of strings which looks like this:
我有一个字符串列表,如下所示:
["Name: Alice, Department: HR, Salary: 60000", "Name: Bob, Department: Engineering, Salary: 45000"]
I would like to convert this list into a DataFrame that looks like this:
我想将此列表转换为如下所示的DataFrame:
Name | Department | Salary
--------------------------
Alice | HR | 60000
Bob | Engineering | 45000
What would be the easiest way to go about this? My gut says throw the data into a CSV and separate titles with regex "^.*:", but there must be a simpler way
最简单的方法是什么?我的直觉说将数据放入CSV并用正则表达式“^。*:”分隔标题,但必须有一个更简单的方法
3 个解决方案
#1
8
With some string processing you can get a list of dicts and pass that to the DataFrame constructor:
通过一些字符串处理,您可以获得一个dicts列表并将其传递给DataFrame构造函数:
lst = ["Name: Alice, Department: HR, Salary: 60000",
"Name: Bob, Department: Engineering, Salary: 45000"]
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst])
Out:
Department Name Salary
0 HR Alice 60000
1 Engineering Bob 45000
#2
3
you can do it this way:
你可以这样做:
In [271]: s
Out[271]:
['Name: Alice, Department: HR, Salary: 60000',
'Name: Bob, Department: Engineering, Salary: 45000']
In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))),
...: names=['Name','Department','Salary'],
...: header=None,
...: lineterminator=r'~'
...: )
...:
Out[272]:
Name Department Salary
0 Alice HR 60000
1 Bob Engineering 45000
#3
#1
8
With some string processing you can get a list of dicts and pass that to the DataFrame constructor:
通过一些字符串处理,您可以获得一个dicts列表并将其传递给DataFrame构造函数:
lst = ["Name: Alice, Department: HR, Salary: 60000",
"Name: Bob, Department: Engineering, Salary: 45000"]
pd.DataFrame([dict([kv.split(': ') for kv in record.split(', ')]) for record in lst])
Out:
Department Name Salary
0 HR Alice 60000
1 Engineering Bob 45000
#2
3
you can do it this way:
你可以这样做:
In [271]: s
Out[271]:
['Name: Alice, Department: HR, Salary: 60000',
'Name: Bob, Department: Engineering, Salary: 45000']
In [272]: pd.read_csv(io.StringIO(re.sub(r'\s*(Name|Department|Salary):\s*', r'', '~'.join(s))),
...: names=['Name','Department','Salary'],
...: header=None,
...: lineterminator=r'~'
...: )
...:
Out[272]:
Name Department Salary
0 Alice HR 60000
1 Bob Engineering 45000