Suppose I have this:
假设我有:
>>> x = pandas.DataFrame([[1.0, 2.0, 3.0], [3, 4, 5]], columns=["A", "B", "C"])
>>> print x
A B C
0 1 2 3
1 3 4 5
Now I want to normalize x
by row --- that is, divide each row by its sum. As described in this question, this can be achieved with x = x.div(x.sum(axis=1), axis=0)
. However, this creates a new DataFrame. If my DataFrame is large, a lot of memory can be consumed in creating this new DataFrame, even though I immediately assign it to the original name.
现在我要把x标准化,也就是,把每一行除以它的和。如本问题所述,这可以通过x = x.div(x.sum(axis=1)、axis=0实现。但是,这会创建一个新的DataFrame。如果我的DataFrame很大,那么在创建这个新的DataFrame时可以消耗大量内存,即使我立即将它分配给原始名称。
Is there an efficient way to perform this operation in place? I want something like x.idiv()
that provides the axis
option of div
but updates x
in place. For this specific case I need the division, but sometimes it would also be nice to have similar in-place versions for all the basic operations.
是否有一种有效的方式来执行这个操作?我想要的是x.idiv(),它提供了div的axis选项,但是更新了x。对于这个特定的情况,我需要除法,但是有时对于所有的基本操作来说,有相似的就地版本也是不错的。
(I can update it in place by iterating over it row-wise and assigning each normalized row back into the original, but this is slow, and I'm looking for a more efficient solution.)
(我可以对它进行逐行迭代并将每个规范化的行赋值回原来的行,这样就可以对它进行适当的更新,但这是很慢的,我正在寻找一种更有效的解决方案。)
1 个解决方案
#1
12
You can do this directly in numpy (without creating a copy):
你可以直接使用numpy(不创建副本):
In [11]: x1 = x.values.T
In [12]: x1
Out[12]:
array([[ 1., 3.],
[ 2., 4.],
[ 3., 5.]])
In [13]: x1 /= x1.sum(0)
In [14]: x
Out[14]:
A B C
0 0.166667 0.333333 0.500000
1 0.250000 0.333333 0.416667
Perhaps there ought to be an inplace flag for div...?
也许应该有一个inplace标志为div…?
#1
12
You can do this directly in numpy (without creating a copy):
你可以直接使用numpy(不创建副本):
In [11]: x1 = x.values.T
In [12]: x1
Out[12]:
array([[ 1., 3.],
[ 2., 4.],
[ 3., 5.]])
In [13]: x1 /= x1.sum(0)
In [14]: x
Out[14]:
A B C
0 0.166667 0.333333 0.500000
1 0.250000 0.333333 0.416667
Perhaps there ought to be an inplace flag for div...?
也许应该有一个inplace标志为div…?