I have a data frame with one column and I'd like to split it into two columns, with one column header as 'fips'
and the other 'row'
我有一个包含一列的数据框,我想将其拆分为两列,一列标题为'fips',另一列为'row'
My dataframe df
looks like this:
我的数据帧df如下所示:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
I do not know how to use df.row.str[:]
to achieve my goal of splitting the row cell. I can use df['fips'] = hello
to add a new column and populate it with hello
. Any ideas?
我不知道如何使用df.row.str [:]来实现分割行单元格的目标。我可以使用df ['fips'] = hello添加一个新列并用hello填充它。有任何想法吗?
fips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
8 个解决方案
#1
69
There might be a better way, but this here's one approach:
可能有更好的方法,但这是一种方法:
In [34]: import pandas as pd
In [35]: df
Out[35]:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
In [36]: df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['flips','row'])
In [37]: df
Out[37]:
flips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
#2
181
TL;DR version:
For the simple case of:
对于简单的情况:
- I have a text column with a delimiter and I want two columns
- 我有一个带分隔符的文本列,我想要两列
The simplest solution is:
最简单的解决方案是:
df['A'], df['B'] = df['AB'].str.split(' ', 1).str
Or you can create create a DataFrame with one column for each entry of the split automatically with:
或者,您可以使用以下内容为每个分割条目创建一个包含一列的DataFrame:
df['AB'].str.split(' ', 1, expand=True)
Notice how, in either case, the .tolist()
method is not necessary. Neither is zip()
.
请注意,在任何一种情况下,都不需要.tolist()方法。 zip()也不是。
In detail:
Andy Hayden's solution is most excellent in demonstrating the power of the str.extract()
method.
Andy Hayden的解决方案在展示str.extract()方法的强大功能方面非常出色。
But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the .str.split()
method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:
但是对于已知分隔符的简单拆分(例如,通过破折号拆分或通过空格拆分),.str.split()方法就足够了。它在字符串的列(系列)上运行,并返回列的列(系列):
>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df
AB
0 A1-B1
1 A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df
AB AB_split
0 A1-B1 [A1, B1]
1 A2-B2 [A2, B2]
1: If you're unsure what the first two parameters of .str.split()
do, I recommend the docs for the plain Python version of the method.
1:如果您不确定.str.split()的前两个参数是什么,我推荐该方法的普通Python版本的文档。
But how do you go from:
但是你怎么做的:
- a column containing two-element lists
- 包含两个元素列表的列
to:
至:
- two columns, each containing the respective element of the lists?
- 两列,每列包含列表的相应元素?
Well, we need to take a closer look at the .str
attribute of a column.
好吧,我们需要仔细查看列的.str属性。
It's a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:
它是一个神奇的对象,用于收集将列中的每个元素视为字符串的方法,然后尽可能高效地在每个元素中应用相应的方法:
>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df
U
0 A
1 B
2 C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df
U L
0 A a
1 B b
2 C c
But it also has an "indexing" interface for getting each element of a string by its index:
但它还有一个“索引”接口,用于通过索引获取字符串的每个元素:
>>> df['AB'].str[0]
0 A
1 A
Name: AB, dtype: object
>>> df['AB'].str[1]
0 1
1 2
Name: AB, dtype: object
Of course, this indexing interface of .str
doesn't really care if each element it's indexing is actually a string, as long as it can be indexed, so:
当然,.str的这个索引接口并不关心它所索引的每个元素实际上是一个字符串,只要它可以被索引,所以:
>>> df['AB'].str.split('-', 1).str[0]
0 A1
1 A2
Name: AB, dtype: object
>>> df['AB'].str.split('-', 1).str[1]
0 B1
1 B2
Name: AB, dtype: object
Then, it's a simple matter of taking advantage of the Python tuple unpacking of iterables to do
然后,利用Python元组解包迭代来做一件简单的事情
>>> df['A'], df['B'] = df['AB'].str.split('-', 1).str
>>> df
AB AB_split A B
0 A1-B1 [A1, B1] A1 B1
1 A2-B2 [A2, B2] A2 B2
Of course, getting a DataFrame out of splitting a column of strings is so useful that the .str.split()
method can do it for you with the expand=True
parameter:
当然,从分割一列字符串中获取DataFrame非常有用,以至于.str.split()方法可以使用expand = True参数为您完成:
>>> df['AB'].str.split('-', 1, expand=True)
0 1
0 A1 B1
1 A2 B2
So, another way of accomplishing what we wanted is to do:
因此,实现我们想要的另一种方法是:
>>> df = df[['AB']]
>>> df
AB
0 A1-B1
1 A2-B2
>>> df.join(df['AB'].str.split('-', 1, expand=True).rename(columns={0:'A', 1:'B'}))
AB A B
0 A1-B1 A1 B1
1 A2-B2 A2 B2
#3
34
You can extract the different parts out quite neatly using a regex pattern:
您可以使用正则表达式模式非常巧妙地提取不同的部分:
In [11]: df.row.str.extract('(?P<fips>\d{5})((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))')
Out[11]:
fips 1 state county state_code
0 00000 UNITED STATES UNITED STATES NaN NaN
1 01000 ALABAMA ALABAMA NaN NaN
2 01001 Autauga County, AL NaN Autauga County AL
3 01003 Baldwin County, AL NaN Baldwin County AL
4 01005 Barbour County, AL NaN Barbour County AL
[5 rows x 5 columns]
To explain the somewhat long regex:
解释有点长的正则表达式:
(?P<fips>\d{5})
- Matches the five digits (
\d
) and names them"fips"
. - 匹配五位数字(\ d)并将它们命名为“fips”。
The next part:
下一部分:
((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
Does either (|
) one of two things:
是(|)两件事之一:
(?P<state>[A-Z ]*$)
- Matches any number (
*
) of capital letters or spaces ([A-Z ]
) and names this"state"
before the end of the string ($
), - 匹配任意数量(*)的大写字母或空格([A-Z])并在字符串结尾($)之前命名此“状态”,
or
要么
(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
- matches anything else (
.*
) then - 然后匹配其他任何东西(。*)
- a comma and a space then
- 然后是一个逗号和一个空格
- matches the two digit
state_code
before the end of the string ($
). - 匹配字符串结尾前的两位数state_code($)。
In the example:
Note that the first two rows hit the "state" (leaving NaN in the county and state_code columns), whilst the last three hit the county, state_code (leaving NaN in the state column).
在示例中:请注意前两行命中“state”(在县和state_code列中留下NaN),而最后三行命中县,state_code(在状态列中留下NaN)。
#4
18
If you don't want to create a new dataframe, or if your dataframe has more columns than just the ones you want to split, you could:
如果您不想创建新的数据框,或者您的数据框的列数多于您要拆分的列数,则可以:
df["flips"], df["row_name"] = zip(*df["row"].str.split().tolist())
del df["row"]
#5
7
If you want to split a string into more than two columns based on a delimiter you can omit the 'maximum splits' parameter.
You can use:
如果要根据分隔符将字符串拆分为两列以上,则可以省略“maximum splits”参数。您可以使用:
df['column_name'].str.split('/', expand=True)
This will automatically create as many columns as the maximum number of fields included in any of your initial strings.
这将自动创建与任何初始字符串中包含的最大字段数一样多的列。
#6
5
You can use str.split
by whitespace (default separator) and parameter expand=True
for DataFrame
with assign to new columns:
您可以使用空格(默认分隔符)的str.split和DataFrame的参数expand = True并分配给新列:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL']})
print (df)
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df[['a','b']] = df['row'].str.split(n=1, expand=True)
print (df)
row a b
0 00000 UNITED STATES 00000 UNITED STATES
1 01000 ALABAMA 01000 ALABAMA
2 01001 Autauga County, AL 01001 Autauga County, AL
3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 01005 Barbour County, AL 01005 Barbour County, AL
Modification if need remove original column with DataFrame.pop
如果需要修改,请使用DataFrame.pop删除原始列
df[['a','b']] = df.pop('row').str.split(n=1, expand=True)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
What is same like:
有什么相同的:
df[['a','b']] = df['row'].str.split(n=1, expand=True)
df = df.drop('row', axis=1)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
If get error:
如果得到错误:
#remove n=1 for split by all whitespaces
df[['a','b']] = df['row'].str.split(expand=True)
ValueError: Columns must be same length as key
ValueError:列的长度必须与键的长度相同
You can check and it return 4 column DataFrame
, not only 2:
您可以检查并返回4列DataFrame,而不仅仅是2:
print (df['row'].str.split(expand=True))
0 1 2 3
0 00000 UNITED STATES None
1 01000 ALABAMA None None
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
Then solution is append new DataFrame
by join
:
然后解决方案是通过join添加新的DataFrame:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL'],
'a':range(5)})
print (df)
a row
0 0 00000 UNITED STATES
1 1 01000 ALABAMA
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL
df = df.join(df['row'].str.split(expand=True))
print (df)
a row 0 1 2 3
0 0 00000 UNITED STATES 00000 UNITED STATES None
1 1 01000 ALABAMA 01000 ALABAMA None None
2 2 01001 Autauga County, AL 01001 Autauga County, AL
3 3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 4 01005 Barbour County, AL 01005 Barbour County, AL
With remove original column (if there are also another columns):
删除原始列(如果还有其他列):
df = df.join(df.pop('row').str.split(expand=True))
print (df)
a 0 1 2 3
0 0 00000 UNITED STATES None
1 1 01000 ALABAMA None None
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL
#7
2
df[['fips', 'row']] = df['row'].str.split(' ', n=1, expand=True)
#8
0
I prefer exporting the corresponding pandas series (i.e. the columns I need), using the apply function to split the column content into multiple series and then join the generated columns to the existing DataFrame. Of course, the source column should be removed.
我更喜欢导出相应的pandas系列(即我需要的列),使用apply函数将列内容拆分为多个系列,然后将生成的列连接到现有的DataFrame。当然,应该删除源列。
e.g.
例如
col1 = df["<col_name>"].apply(<function>)
col2 = ...
df = df.join(col1.to_frame(name="<name1>"))
df = df.join(col2.toframe(name="<name2>"))
df = df.drop(["<col_name>"], axis=1)
To split two words strings function should be something like that:
要分割两个单词字符串函数应该是这样的:
lambda x: x.split(" ")[0] # for the first element
lambda x: x.split(" ")[-1] # for the last element
#1
69
There might be a better way, but this here's one approach:
可能有更好的方法,但这是一种方法:
In [34]: import pandas as pd
In [35]: df
Out[35]:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
In [36]: df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['flips','row'])
In [37]: df
Out[37]:
flips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
#2
181
TL;DR version:
For the simple case of:
对于简单的情况:
- I have a text column with a delimiter and I want two columns
- 我有一个带分隔符的文本列,我想要两列
The simplest solution is:
最简单的解决方案是:
df['A'], df['B'] = df['AB'].str.split(' ', 1).str
Or you can create create a DataFrame with one column for each entry of the split automatically with:
或者,您可以使用以下内容为每个分割条目创建一个包含一列的DataFrame:
df['AB'].str.split(' ', 1, expand=True)
Notice how, in either case, the .tolist()
method is not necessary. Neither is zip()
.
请注意,在任何一种情况下,都不需要.tolist()方法。 zip()也不是。
In detail:
Andy Hayden's solution is most excellent in demonstrating the power of the str.extract()
method.
Andy Hayden的解决方案在展示str.extract()方法的强大功能方面非常出色。
But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the .str.split()
method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:
但是对于已知分隔符的简单拆分(例如,通过破折号拆分或通过空格拆分),.str.split()方法就足够了。它在字符串的列(系列)上运行,并返回列的列(系列):
>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df
AB
0 A1-B1
1 A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df
AB AB_split
0 A1-B1 [A1, B1]
1 A2-B2 [A2, B2]
1: If you're unsure what the first two parameters of .str.split()
do, I recommend the docs for the plain Python version of the method.
1:如果您不确定.str.split()的前两个参数是什么,我推荐该方法的普通Python版本的文档。
But how do you go from:
但是你怎么做的:
- a column containing two-element lists
- 包含两个元素列表的列
to:
至:
- two columns, each containing the respective element of the lists?
- 两列,每列包含列表的相应元素?
Well, we need to take a closer look at the .str
attribute of a column.
好吧,我们需要仔细查看列的.str属性。
It's a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:
它是一个神奇的对象,用于收集将列中的每个元素视为字符串的方法,然后尽可能高效地在每个元素中应用相应的方法:
>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df
U
0 A
1 B
2 C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df
U L
0 A a
1 B b
2 C c
But it also has an "indexing" interface for getting each element of a string by its index:
但它还有一个“索引”接口,用于通过索引获取字符串的每个元素:
>>> df['AB'].str[0]
0 A
1 A
Name: AB, dtype: object
>>> df['AB'].str[1]
0 1
1 2
Name: AB, dtype: object
Of course, this indexing interface of .str
doesn't really care if each element it's indexing is actually a string, as long as it can be indexed, so:
当然,.str的这个索引接口并不关心它所索引的每个元素实际上是一个字符串,只要它可以被索引,所以:
>>> df['AB'].str.split('-', 1).str[0]
0 A1
1 A2
Name: AB, dtype: object
>>> df['AB'].str.split('-', 1).str[1]
0 B1
1 B2
Name: AB, dtype: object
Then, it's a simple matter of taking advantage of the Python tuple unpacking of iterables to do
然后,利用Python元组解包迭代来做一件简单的事情
>>> df['A'], df['B'] = df['AB'].str.split('-', 1).str
>>> df
AB AB_split A B
0 A1-B1 [A1, B1] A1 B1
1 A2-B2 [A2, B2] A2 B2
Of course, getting a DataFrame out of splitting a column of strings is so useful that the .str.split()
method can do it for you with the expand=True
parameter:
当然,从分割一列字符串中获取DataFrame非常有用,以至于.str.split()方法可以使用expand = True参数为您完成:
>>> df['AB'].str.split('-', 1, expand=True)
0 1
0 A1 B1
1 A2 B2
So, another way of accomplishing what we wanted is to do:
因此,实现我们想要的另一种方法是:
>>> df = df[['AB']]
>>> df
AB
0 A1-B1
1 A2-B2
>>> df.join(df['AB'].str.split('-', 1, expand=True).rename(columns={0:'A', 1:'B'}))
AB A B
0 A1-B1 A1 B1
1 A2-B2 A2 B2
#3
34
You can extract the different parts out quite neatly using a regex pattern:
您可以使用正则表达式模式非常巧妙地提取不同的部分:
In [11]: df.row.str.extract('(?P<fips>\d{5})((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))')
Out[11]:
fips 1 state county state_code
0 00000 UNITED STATES UNITED STATES NaN NaN
1 01000 ALABAMA ALABAMA NaN NaN
2 01001 Autauga County, AL NaN Autauga County AL
3 01003 Baldwin County, AL NaN Baldwin County AL
4 01005 Barbour County, AL NaN Barbour County AL
[5 rows x 5 columns]
To explain the somewhat long regex:
解释有点长的正则表达式:
(?P<fips>\d{5})
- Matches the five digits (
\d
) and names them"fips"
. - 匹配五位数字(\ d)并将它们命名为“fips”。
The next part:
下一部分:
((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
Does either (|
) one of two things:
是(|)两件事之一:
(?P<state>[A-Z ]*$)
- Matches any number (
*
) of capital letters or spaces ([A-Z ]
) and names this"state"
before the end of the string ($
), - 匹配任意数量(*)的大写字母或空格([A-Z])并在字符串结尾($)之前命名此“状态”,
or
要么
(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
- matches anything else (
.*
) then - 然后匹配其他任何东西(。*)
- a comma and a space then
- 然后是一个逗号和一个空格
- matches the two digit
state_code
before the end of the string ($
). - 匹配字符串结尾前的两位数state_code($)。
In the example:
Note that the first two rows hit the "state" (leaving NaN in the county and state_code columns), whilst the last three hit the county, state_code (leaving NaN in the state column).
在示例中:请注意前两行命中“state”(在县和state_code列中留下NaN),而最后三行命中县,state_code(在状态列中留下NaN)。
#4
18
If you don't want to create a new dataframe, or if your dataframe has more columns than just the ones you want to split, you could:
如果您不想创建新的数据框,或者您的数据框的列数多于您要拆分的列数,则可以:
df["flips"], df["row_name"] = zip(*df["row"].str.split().tolist())
del df["row"]
#5
7
If you want to split a string into more than two columns based on a delimiter you can omit the 'maximum splits' parameter.
You can use:
如果要根据分隔符将字符串拆分为两列以上,则可以省略“maximum splits”参数。您可以使用:
df['column_name'].str.split('/', expand=True)
This will automatically create as many columns as the maximum number of fields included in any of your initial strings.
这将自动创建与任何初始字符串中包含的最大字段数一样多的列。
#6
5
You can use str.split
by whitespace (default separator) and parameter expand=True
for DataFrame
with assign to new columns:
您可以使用空格(默认分隔符)的str.split和DataFrame的参数expand = True并分配给新列:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL']})
print (df)
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df[['a','b']] = df['row'].str.split(n=1, expand=True)
print (df)
row a b
0 00000 UNITED STATES 00000 UNITED STATES
1 01000 ALABAMA 01000 ALABAMA
2 01001 Autauga County, AL 01001 Autauga County, AL
3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 01005 Barbour County, AL 01005 Barbour County, AL
Modification if need remove original column with DataFrame.pop
如果需要修改,请使用DataFrame.pop删除原始列
df[['a','b']] = df.pop('row').str.split(n=1, expand=True)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
What is same like:
有什么相同的:
df[['a','b']] = df['row'].str.split(n=1, expand=True)
df = df.drop('row', axis=1)
print (df)
a b
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
If get error:
如果得到错误:
#remove n=1 for split by all whitespaces
df[['a','b']] = df['row'].str.split(expand=True)
ValueError: Columns must be same length as key
ValueError:列的长度必须与键的长度相同
You can check and it return 4 column DataFrame
, not only 2:
您可以检查并返回4列DataFrame,而不仅仅是2:
print (df['row'].str.split(expand=True))
0 1 2 3
0 00000 UNITED STATES None
1 01000 ALABAMA None None
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
Then solution is append new DataFrame
by join
:
然后解决方案是通过join添加新的DataFrame:
df = pd.DataFrame({'row': ['00000 UNITED STATES', '01000 ALABAMA',
'01001 Autauga County, AL', '01003 Baldwin County, AL',
'01005 Barbour County, AL'],
'a':range(5)})
print (df)
a row
0 0 00000 UNITED STATES
1 1 01000 ALABAMA
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL
df = df.join(df['row'].str.split(expand=True))
print (df)
a row 0 1 2 3
0 0 00000 UNITED STATES 00000 UNITED STATES None
1 1 01000 ALABAMA 01000 ALABAMA None None
2 2 01001 Autauga County, AL 01001 Autauga County, AL
3 3 01003 Baldwin County, AL 01003 Baldwin County, AL
4 4 01005 Barbour County, AL 01005 Barbour County, AL
With remove original column (if there are also another columns):
删除原始列(如果还有其他列):
df = df.join(df.pop('row').str.split(expand=True))
print (df)
a 0 1 2 3
0 0 00000 UNITED STATES None
1 1 01000 ALABAMA None None
2 2 01001 Autauga County, AL
3 3 01003 Baldwin County, AL
4 4 01005 Barbour County, AL
#7
2
df[['fips', 'row']] = df['row'].str.split(' ', n=1, expand=True)
#8
0
I prefer exporting the corresponding pandas series (i.e. the columns I need), using the apply function to split the column content into multiple series and then join the generated columns to the existing DataFrame. Of course, the source column should be removed.
我更喜欢导出相应的pandas系列(即我需要的列),使用apply函数将列内容拆分为多个系列,然后将生成的列连接到现有的DataFrame。当然,应该删除源列。
e.g.
例如
col1 = df["<col_name>"].apply(<function>)
col2 = ...
df = df.join(col1.to_frame(name="<name1>"))
df = df.join(col2.toframe(name="<name2>"))
df = df.drop(["<col_name>"], axis=1)
To split two words strings function should be something like that:
要分割两个单词字符串函数应该是这样的:
lambda x: x.split(" ")[0] # for the first element
lambda x: x.split(" ")[-1] # for the last element