My situation: i have a pandas dataframe so that, for each row, I have to compute the following.
我的情况:我有一个pandas数据帧,因此,对于每一行,我必须计算以下内容。
1) Get the first valute na
excluded (df.apply(lambda x: x.dropna().iloc[0])
)
1)获得排除的第一个valute(df.apply(lambda x:x.dropna()。iloc [0]))
2) Get the last valute na
excluded (df.apply(lambda x: x.dropna().iloc[-1])
)
2)获取排除的最后一个valute(df.apply(lambda x:x.dropna()。iloc [-1]))
3) Count the non na
values (df.apply(lambda x: len(x.dropna())
)
3)计算非na值(df.apply(lambda x:len(x.dropna()))
Sample case and expected output :
样本案例和预期输出:
x = np.array([[1,2,np.nan], [4,5,6], [np.nan, 8,9]])
1) [1, 4, 8]
2) [2, 6, 9]
3) [2, 3, 2]
And i need to keep it optimized. So i turned to numpy
and looked for a way to apply y = x[~numpy.isnan(x)]
on a NxK
array as a first step. Then,i would use what was shown here (Vectorized way of accessing row specific elements in a numpy array) for 1) and 2) but i am still empty handed for 3)
我需要保持优化。所以我转向numpy,并寻找一种方法在NxK数组上应用y = x [~numpy.isnan(x)]作为第一步。然后,我会使用这里显示的内容(在numpy数组中访问行特定元素的矢量化方式)1)和2)但我仍然空手而已3)
1 个解决方案
#1
1
Here's one way -
这是一种方式 -
In [756]: x
Out[756]:
array([[ 1., 2., nan],
[ 4., 5., 6.],
[ nan, 8., 9.]])
In [768]: m = ~np.isnan(x)
In [769]: first_idx = m.argmax(1)
In [770]: last_idx = m.shape[1] - m[:,::-1].argmax(1) - 1
In [771]: x[np.arange(len(first_idx)), first_idx]
Out[771]: array([ 1., 4., 8.])
In [772]: x[np.arange(len(last_idx)), last_idx]
Out[772]: array([ 2., 6., 9.])
In [773]: m.sum(1)
Out[773]: array([2, 3, 2])
Alternatively, we could make use of cumulative-summation
to get those indices, like so -
或者,我们可以利用累积求和来得到那些指数,就像这样 -
In [787]: c = m.cumsum(1)
In [788]: first_idx = (c==1).argmax(1)
In [789]: last_idx = c.argmax(1)
#1
1
Here's one way -
这是一种方式 -
In [756]: x
Out[756]:
array([[ 1., 2., nan],
[ 4., 5., 6.],
[ nan, 8., 9.]])
In [768]: m = ~np.isnan(x)
In [769]: first_idx = m.argmax(1)
In [770]: last_idx = m.shape[1] - m[:,::-1].argmax(1) - 1
In [771]: x[np.arange(len(first_idx)), first_idx]
Out[771]: array([ 1., 4., 8.])
In [772]: x[np.arange(len(last_idx)), last_idx]
Out[772]: array([ 2., 6., 9.])
In [773]: m.sum(1)
Out[773]: array([2, 3, 2])
Alternatively, we could make use of cumulative-summation
to get those indices, like so -
或者,我们可以利用累积求和来得到那些指数,就像这样 -
In [787]: c = m.cumsum(1)
In [788]: first_idx = (c==1).argmax(1)
In [789]: last_idx = c.argmax(1)