熊猫处理从数据帧到数据透视表时丢失的值

时间:2022-10-30 20:07:24

Given the following pandas data frame:

给定以下熊猫数据框架:

df = pd.DataFrame({'A': ['foo' ] * 3 + ['bar'],
         'B': ['w','x']*2,
         'C': ['y', 'z', 'a','a'],
         'D': rand.randn(4),
          })

print df.to_string()
"""
     A  B  C           D
0  foo  w  y  0.06075020
1  foo  x  z  0.21112476
2  foo  w  a  0.01652757
3  bar  x  a  0.17718772
"""

Notice how there is no bar,w combination. When doing the following:

注意这里没有bar w组合。当执行以下操作:

pv0 = pandas.pivot_table(df, rows=['A','B'],cols=['C'], aggfunc=numpy.sum)

pv0.ix['bar','x'] #returns result

pv0.ix['bar','w'] #key error though i would like it to return all Nan's

pv0.index #returns 
[(bar, x), (foo, w), (foo, x)]

As long as there is at least one entry in column 'C' as in the case of foo,x (it only has a value for 'z' in the 'C' column) it will return NaN for the other column values of 'C' not present for foo,x (e.g. 'a','y')

只要在“C”列中至少有一个条目,如对于foo,x(它只在“C”列中有一个“z”的值),它就会返回“C”的其他列值的NaN,而不是foo,x(例如。' a ',' y ')

What I would like would be to have all multiindex combinations, even those that have no data for all column values.

我想要的是所有的多索引组合,甚至那些没有所有列值的数据的组合。

pv0.index #I would like it to return
[(bar, w), (bar, x), (foo, w), (foo, x)]

I can wrap the .ix commands in try/except blocks, but is there a way that pandas can fill this in automatically?

我可以将.ix命令封装在try/except块中,但是有什么方法可以让熊猫自动填充它吗?

1 个解决方案

#1


5  

You can use reindex() method:

可以使用reindex()方法:

>>> df1 = pd.pivot_table(df, rows=['A','B'], cols='C', aggfunc=np.sum)
>>> df1
              D                   
C             a        y         z
A   B                             
bar x  0.161702      NaN       NaN
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701

>>> index = list(iter.product(df['A'].unique(), df['B'].unique()))
>>> df1.reindex(index)
              D                   
C             a        y         z
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701
bar w       NaN      NaN       NaN
    x  0.161702      NaN       NaN

#1


5  

You can use reindex() method:

可以使用reindex()方法:

>>> df1 = pd.pivot_table(df, rows=['A','B'], cols='C', aggfunc=np.sum)
>>> df1
              D                   
C             a        y         z
A   B                             
bar x  0.161702      NaN       NaN
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701

>>> index = list(iter.product(df['A'].unique(), df['B'].unique()))
>>> df1.reindex(index)
              D                   
C             a        y         z
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701
bar w       NaN      NaN       NaN
    x  0.161702      NaN       NaN