如何以矢量化方式删除na和count值nxK数组numpy数组

时间:2022-09-14 12:06:30

My situation: i have a pandas dataframe so that, for each row, I have to compute the following.

我的情况:我有一个pandas数据帧,因此,对于每一行,我必须计算以下内容。

1) Get the first valute na excluded (df.apply(lambda x: x.dropna().iloc[0]))

1)获得排除的第一个valute(df.apply(lambda x:x.dropna()。iloc [0]))

2) Get the last valute na excluded (df.apply(lambda x: x.dropna().iloc[-1]))

2)获取排除的最后一个valute(df.apply(lambda x:x.dropna()。iloc [-1]))

3) Count the non na values (df.apply(lambda x: len(x.dropna()))

3)计算非na值(df.apply(lambda x:len(x.dropna()))

Sample case and expected output :

样本案例和预期输出:

x = np.array([[1,2,np.nan], [4,5,6], [np.nan, 8,9]])
1) [1, 4, 8]
2) [2, 6, 9]
3) [2, 3, 2]

And i need to keep it optimized. So i turned to numpy and looked for a way to apply y = x[~numpy.isnan(x)] on a NxK array as a first step. Then,i would use what was shown here (Vectorized way of accessing row specific elements in a numpy array) for 1) and 2) but i am still empty handed for 3)

我需要保持优化。所以我转向numpy,并寻找一种方法在NxK数组上应用y = x [~numpy.isnan(x)]作为第一步。然后,我会使用这里显示的内容(在numpy数组中访问行特定元素的矢量化方式)1)和2)但我仍然空手而已3)

1 个解决方案

#1


1  

Here's one way -

这是一种方式 -

In [756]: x
Out[756]: 
array([[  1.,   2.,  nan],
       [  4.,   5.,   6.],
       [ nan,   8.,   9.]])

In [768]: m = ~np.isnan(x)

In [769]: first_idx = m.argmax(1)

In [770]: last_idx = m.shape[1] - m[:,::-1].argmax(1) - 1

In [771]: x[np.arange(len(first_idx)), first_idx]
Out[771]: array([ 1.,  4.,  8.])

In [772]: x[np.arange(len(last_idx)), last_idx]
Out[772]: array([ 2.,  6.,  9.])

In [773]: m.sum(1)
Out[773]: array([2, 3, 2])

Alternatively, we could make use of cumulative-summation to get those indices, like so -

或者,我们可以利用累积求和来得到那些指数,就像这样 -

In [787]: c = m.cumsum(1)

In [788]: first_idx = (c==1).argmax(1)

In [789]: last_idx = c.argmax(1)

#1


1  

Here's one way -

这是一种方式 -

In [756]: x
Out[756]: 
array([[  1.,   2.,  nan],
       [  4.,   5.,   6.],
       [ nan,   8.,   9.]])

In [768]: m = ~np.isnan(x)

In [769]: first_idx = m.argmax(1)

In [770]: last_idx = m.shape[1] - m[:,::-1].argmax(1) - 1

In [771]: x[np.arange(len(first_idx)), first_idx]
Out[771]: array([ 1.,  4.,  8.])

In [772]: x[np.arange(len(last_idx)), last_idx]
Out[772]: array([ 2.,  6.,  9.])

In [773]: m.sum(1)
Out[773]: array([2, 3, 2])

Alternatively, we could make use of cumulative-summation to get those indices, like so -

或者,我们可以利用累积求和来得到那些指数,就像这样 -

In [787]: c = m.cumsum(1)

In [788]: first_idx = (c==1).argmax(1)

In [789]: last_idx = c.argmax(1)