对熊猫DataFrame进行就地行行操作。

Suppose I have this:

假设我有:

>>> x = pandas.DataFrame([[1.0, 2.0, 3.0], [3, 4, 5]], columns=["A", "B", "C"])
>>> print x
   A  B  C
0  1  2  3
1  3  4  5

Now I want to normalize x by row --- that is, divide each row by its sum. As described in this question, this can be achieved with x = x.div(x.sum(axis=1), axis=0). However, this creates a new DataFrame. If my DataFrame is large, a lot of memory can be consumed in creating this new DataFrame, even though I immediately assign it to the original name.

现在我要把x标准化，也就是，把每一行除以它的和。如本问题所述，这可以通过x = x.div(x.sum(axis=1)、axis=0实现。但是，这会创建一个新的DataFrame。如果我的DataFrame很大，那么在创建这个新的DataFrame时可以消耗大量内存，即使我立即将它分配给原始名称。

Is there an efficient way to perform this operation in place? I want something like x.idiv() that provides the axis option of div but updates x in place. For this specific case I need the division, but sometimes it would also be nice to have similar in-place versions for all the basic operations.

是否有一种有效的方式来执行这个操作?我想要的是x.idiv()，它提供了div的axis选项，但是更新了x。对于这个特定的情况，我需要除法，但是有时对于所有的基本操作来说，有相似的就地版本也是不错的。

(I can update it in place by iterating over it row-wise and assigning each normalized row back into the original, but this is slow, and I'm looking for a more efficient solution.)

(我可以对它进行逐行迭代并将每个规范化的行赋值回原来的行，这样就可以对它进行适当的更新，但这是很慢的，我正在寻找一种更有效的解决方案。)

1 个解决方案

#1

You can do this directly in numpy (without creating a copy):

你可以直接使用numpy(不创建副本):

In [11]: x1 = x.values.T

In [12]: x1
Out[12]: 
array([[ 1.,  3.],
       [ 2.,  4.],
       [ 3.,  5.]])

In [13]: x1 /= x1.sum(0)

In [14]: x
Out[14]: 
          A         B         C
0  0.166667  0.333333  0.500000
1  0.250000  0.333333  0.416667

Perhaps there ought to be an inplace flag for div...?

也许应该有一个inplace标志为div…?

#1