I have a pandas DataFrame
that has multiple columns in it:
我有一个pandas DataFrame,里面有多个列:
Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
foo 11516 non-null values
bar 228381 non-null values
Time_UTC 239897 non-null values
dtstamp 239897 non-null values
dtypes: float64(4), object(1)
where foo
and bar
are columns which contain the same data yet are named differently. Is there are a way to move the rows which make up foo
into bar
, ideally whilst maintaining the name of bar
?
其中foo和bar是包含相同数据但名称不同的列。有没有办法将构成foo的行移动到bar中,理想情况下保持bar的名称?
In the end the DataFrame should appear as:
最后,DataFrame应显示为:
Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
bar 239897 non-null values
Time_UTC 239897 non-null values
dtstamp 239897 non-null values
dtypes: float64(4), object(1)
That is the NaN values that made up bar were replaced by the values from foo
.
这就是组成bar的NaN值被foo中的值替换。
5 个解决方案
#1
21
Try this:
pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)
If you want that data to become the new column bar
, just assign the result to df['bar']
.
如果您希望该数据成为新的栏栏,只需将结果分配给df ['bar']。
#2
22
you can use directly fillna and assigning the result to the column 'bar'
你可以直接使用fillna并将结果分配给列'bar'
df['bar'].fillna(df['foo'], inplace=True)
del df['foo']
general example:
import pandas as pd
#creating the table with two missing values
df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2])
df2 = pd.DataFrame({'b':[5,6]}, index = [3,4])
dftot = pd.concat((df1, df2))
print dftot
#creating the dataframe to fill the missing values
filldf = pd.DataFrame({'a':[7,7,7,7]})
#filling
print dftot.fillna(filldf)
#3
5
Another option, use the .apply()
method on the frame. You can do reassign a column with deference to existing data...
另一种选择是在框架上使用.apply()方法。您可以根据现有数据重新分配列...
import pandas as pd
import numpy as np
# get your data into a dataframe
# replace content in "bar" with "foo" if "bar" is null
df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1)
# note: change 'np.NaN' with null values you have like an empty string
#4
4
More modern pandas versions (since at least 0.12) have the combine_first()
and update()
methods for DataFrame and Series objects. For example if your DataFrame were called df
, you would do:
更现代的pandas版本(至少0.12)具有DataFrame和Series对象的combine_first()和update()方法。例如,如果你的DataFrame被称为df,你会这样做:
df.bar.combine_first(df.foo)
which would only alter Nan values of the bar
column to match the foo
column, and would do so inplace. To overwrite non-Nan values in bar
with those in foo
, you would use the update()
method.
这只会改变bar列的Nan值以匹配foo列,并且会在原地进行。要使用foo中的非Nan值覆盖bar中的非Nan值,可以使用update()方法。
#5
2
You can do this using numpy
too.
你也可以使用numpy来做到这一点。
df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])
df ['bar'] = np.where(pd.isnull(df ['bar']),df ['foo'],df ['bar'])
#1
21
Try this:
pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)
If you want that data to become the new column bar
, just assign the result to df['bar']
.
如果您希望该数据成为新的栏栏,只需将结果分配给df ['bar']。
#2
22
you can use directly fillna and assigning the result to the column 'bar'
你可以直接使用fillna并将结果分配给列'bar'
df['bar'].fillna(df['foo'], inplace=True)
del df['foo']
general example:
import pandas as pd
#creating the table with two missing values
df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2])
df2 = pd.DataFrame({'b':[5,6]}, index = [3,4])
dftot = pd.concat((df1, df2))
print dftot
#creating the dataframe to fill the missing values
filldf = pd.DataFrame({'a':[7,7,7,7]})
#filling
print dftot.fillna(filldf)
#3
5
Another option, use the .apply()
method on the frame. You can do reassign a column with deference to existing data...
另一种选择是在框架上使用.apply()方法。您可以根据现有数据重新分配列...
import pandas as pd
import numpy as np
# get your data into a dataframe
# replace content in "bar" with "foo" if "bar" is null
df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1)
# note: change 'np.NaN' with null values you have like an empty string
#4
4
More modern pandas versions (since at least 0.12) have the combine_first()
and update()
methods for DataFrame and Series objects. For example if your DataFrame were called df
, you would do:
更现代的pandas版本(至少0.12)具有DataFrame和Series对象的combine_first()和update()方法。例如,如果你的DataFrame被称为df,你会这样做:
df.bar.combine_first(df.foo)
which would only alter Nan values of the bar
column to match the foo
column, and would do so inplace. To overwrite non-Nan values in bar
with those in foo
, you would use the update()
method.
这只会改变bar列的Nan值以匹配foo列,并且会在原地进行。要使用foo中的非Nan值覆盖bar中的非Nan值,可以使用update()方法。
#5
2
You can do this using numpy
too.
你也可以使用numpy来做到这一点。
df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])
df ['bar'] = np.where(pd.isnull(df ['bar']),df ['foo'],df ['bar'])