I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:
我有一个带有timeindex的数据框和包含3D向量坐标的3列:
x y z
ts
2014-05-15 10:38 0.120117 0.987305 0.116211
2014-05-15 10:39 0.117188 0.984375 0.122070
2014-05-15 10:40 0.119141 0.987305 0.119141
2014-05-15 10:41 0.116211 0.984375 0.120117
2014-05-15 10:42 0.119141 0.983398 0.118164
I would like to apply a transformation to each row that also returns a vector
我想对每个也返回向量的行应用转换
def myfunc(a, b, c):
do something
return e, f, g
but if I do:
但如果我这样做:
df.apply(myfunc, axis=1)
I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?
我最终得到了一个Pandas系列,其元素是元组。这是因为申请将取得myfunc的结果而不解压缩它。如何更改myfunc以便获得包含3列的新df?
Edit:
All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.
以下所有解决方案均可Series系列解决方案允许列名称,List解决方案似乎执行得更快。
def myfunc1(args):
e=args[0] + 2*args[1]
f=args[1]*args[2] +1
g=args[2] + args[0] * args[1]
return pd.Series([e,f,g], index=['a', 'b', 'c'])
def myfunc2(args):
e=args[0] + 2*args[1]
f=args[1]*args[2] +1
g=args[2] + args[0] * args[1]
return [e,f,g]
%timeit df.apply(myfunc1 ,axis=1)
100 loops, best of 3: 4.51 ms per loop
%timeit df.apply(myfunc2 ,axis=1)
100 loops, best of 3: 2.75 ms per loop
4 个解决方案
#1
6
Just return a list instead of tuple.
只需返回一个列表而不是元组。
In [81]: df
Out[81]:
x y z
ts
2014-05-15 10:38:00 0.120117 0.987305 0.116211
2014-05-15 10:39:00 0.117188 0.984375 0.122070
2014-05-15 10:40:00 0.119141 0.987305 0.119141
2014-05-15 10:41:00 0.116211 0.984375 0.120117
2014-05-15 10:42:00 0.119141 0.983398 0.118164
[5 rows x 3 columns]
In [82]: def myfunc(args):
....: e=args[0] + 2*args[1]
....: f=args[1]*args[2] +1
....: g=args[2] + args[0] * args[1]
....: return [e,f,g]
....:
In [83]: df.apply(myfunc ,axis=1)
Out[83]:
x y z
ts
2014-05-15 10:38:00 2.094727 1.114736 0.234803
2014-05-15 10:39:00 2.085938 1.120163 0.237427
2014-05-15 10:40:00 2.093751 1.117629 0.236770
2014-05-15 10:41:00 2.084961 1.118240 0.234512
2014-05-15 10:42:00 2.085937 1.116202 0.235327
#2
26
Return Series
and it will put them in a DataFrame.
返回系列,它将把它们放在一个DataFrame中。
def myfunc(a, b, c):
do something
return pd.Series([e, f, g])
This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.
这样可以为每个结果列提供标签。如果返回DataFrame,则只为该组插入多行。
#3
9
Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.
基于@ U2EF1的优秀答案,我创建了一个方便的函数,它应用一个将元组返回到数据帧字段的指定函数,并将结果扩展回数据帧。
def apply_and_concat(dataframe, field, func, column_names):
return pd.concat((
dataframe,
dataframe[field].apply(
lambda cell: pd.Series(func(cell), index=column_names))), axis=1)
Usage:
df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
A
a 1
b 2
c 3
def func(x):
return x*x, x*x*x
print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])
A x^2 x^3
a 1 1 1
b 2 4 8
c 3 9 27
Hope it helps someone.
希望它可以帮到某人。
#4
2
Found a possible solution, by changing myfunc to return an np.array like this:
找到一个可能的解决方案,通过更改myfunc返回如下的np.array:
import numpy as np
def myfunc(a, b, c):
do something
return np.array((e, f, g))
any better solution?
更好的解决方案?
#1
6
Just return a list instead of tuple.
只需返回一个列表而不是元组。
In [81]: df
Out[81]:
x y z
ts
2014-05-15 10:38:00 0.120117 0.987305 0.116211
2014-05-15 10:39:00 0.117188 0.984375 0.122070
2014-05-15 10:40:00 0.119141 0.987305 0.119141
2014-05-15 10:41:00 0.116211 0.984375 0.120117
2014-05-15 10:42:00 0.119141 0.983398 0.118164
[5 rows x 3 columns]
In [82]: def myfunc(args):
....: e=args[0] + 2*args[1]
....: f=args[1]*args[2] +1
....: g=args[2] + args[0] * args[1]
....: return [e,f,g]
....:
In [83]: df.apply(myfunc ,axis=1)
Out[83]:
x y z
ts
2014-05-15 10:38:00 2.094727 1.114736 0.234803
2014-05-15 10:39:00 2.085938 1.120163 0.237427
2014-05-15 10:40:00 2.093751 1.117629 0.236770
2014-05-15 10:41:00 2.084961 1.118240 0.234512
2014-05-15 10:42:00 2.085937 1.116202 0.235327
#2
26
Return Series
and it will put them in a DataFrame.
返回系列,它将把它们放在一个DataFrame中。
def myfunc(a, b, c):
do something
return pd.Series([e, f, g])
This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.
这样可以为每个结果列提供标签。如果返回DataFrame,则只为该组插入多行。
#3
9
Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.
基于@ U2EF1的优秀答案,我创建了一个方便的函数,它应用一个将元组返回到数据帧字段的指定函数,并将结果扩展回数据帧。
def apply_and_concat(dataframe, field, func, column_names):
return pd.concat((
dataframe,
dataframe[field].apply(
lambda cell: pd.Series(func(cell), index=column_names))), axis=1)
Usage:
df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
A
a 1
b 2
c 3
def func(x):
return x*x, x*x*x
print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])
A x^2 x^3
a 1 1 1
b 2 4 8
c 3 9 27
Hope it helps someone.
希望它可以帮到某人。
#4
2
Found a possible solution, by changing myfunc to return an np.array like this:
找到一个可能的解决方案,通过更改myfunc返回如下的np.array:
import numpy as np
def myfunc(a, b, c):
do something
return np.array((e, f, g))
any better solution?
更好的解决方案?