在列索引组中的数据框中保留每行的前N个值

I'm having trouble finding an elegant solution to this problem (there might not be one).

我找不到这个问题的优雅解决方案（可能没有）。

I have the following example DataFrame:

我有以下示例DataFrame：

np.random.seed(0)

np.random.seed（0）

df = pd.DataFrame(np.random.randn(10,10)).abs()

df = pd.DataFrame（np.random.randn（10,10））。abs（）

          0         1         2         3         4         5         6  \
0  1.764052  0.400157  0.978738  2.240893  1.867558  0.977278  0.950088   
1  0.144044  1.454274  0.761038  0.121675  0.443863  0.333674  1.494079   
2  2.552990  0.653619  0.864436  0.742165  2.269755  1.454366  0.045759   
3  0.154947  0.378163  0.887786  1.980796  0.347912  0.156349  1.230291   
4  1.048553  1.420018  1.706270  1.950775  0.509652  0.438074  1.252795   
5  0.895467  0.386902  0.510805  1.180632  0.028182  0.428332  0.066517   
6  0.672460  0.359553  0.813146  1.726283  0.177426  0.401781  1.630198   
7  0.729091  0.128983  1.139401  1.234826  0.402342  0.684810  0.870797   
8  1.165150  0.900826  0.465662  1.536244  1.488252  1.895889  1.178780   
9  0.403177  1.222445  0.208275  0.976639  0.356366  0.706573  0.010500   

          7         8         9  
0  0.151357  0.103219  0.410599  
1  0.205158  0.313068  0.854096  
2  0.187184  1.532779  1.469359  
3  1.202380  0.387327  0.302303  
4  0.777490  1.613898  0.212740  
5  0.302472  0.634322  0.362741  
6  0.462782  0.907298  0.051945  
7  0.578850  0.311553  0.056165  
8  0.179925  1.070753  1.054452  
9  1.785870  0.126912  0.401989

I have the following zones map:

我有以下区域地图：

zones = {"A": [0,1,2], "B": [3,4], "C": [5,6,7,8], "D": [9]}

zones = {“A”：[0,1,2]，“B”：[3,4]，“C”：[5,6,7,8]，“D”：[9]}

The zones show me groups of columns that I should examine together and for each row of df[columns] DataFrame, keep the top N items (NB: keep the top N items across the the row, i.e. cross-sectionally - see later), set the rest to zero. For example for zone "A" with N=2, I would examine the following DataFrame:

区域显示我应该一起检查的列组和df [columns] DataFrame的每一行，保留前N个项目（注意：保留前面的N个项目，即横截面 - 见后面），将其余部分设为零。例如，对于N = 2的区域“A”，我将检查以下DataFrame：

          0         1         2
0  1.764052  0.400157  0.978738
1  0.144044  1.454274  0.761038
2  2.552990  0.653619  0.864436
3  0.154947  0.378163  0.887786
4  1.048553  1.420018  1.706270
5  0.895467  0.386902  0.510805
6  0.672460  0.359553  0.813146
7  0.729091  0.128983  1.139401
8  1.165150  0.900826  0.465662
9  0.403177  1.222445  0.208275

and because N=2 I will keep the top N items:

因为N = 2我将保留前N项：

          0         1         2
0  1.764052  0.        0.978738
1  0.        1.454274  0.761038
2  2.552990  0.        0.864436
3  0.        0.378163  0.887786
4  0.        1.420018  1.706270
5  0.895467  0.        0.510805
6  0.672460  0.        0.813146
7  0.729091  0.        1.139401
8  1.165150  0.900826  0.
9  0.403177  1.222445  0.

The entire output with the zone map above and N=2 will look like:

上面带有区域图并且N = 2的整个输出将如下所示：

          0         1         2         3         4         5         6  \
0  1.764052  0.        0.978738  2.240893  1.867558  0.977278  0.950088   
1  0.        1.454274  0.761038  0.121675  0.443863  0.333674  1.494079   
2  2.552990  0.        0.864436  0.742165  2.269755  1.454366  0.         
3  0.        0.378163  0.887786  1.980796  0.347912  0.        1.230291   
4  0.        1.420018  1.706270  1.950775  0.509652  0.        1.252795   
5  0.895467  0.        0.510805  1.180632  0.028182  0.428332  0.         
6  0.672460  0.        0.813146  1.726283  0.177426  0.        1.630198   
7  0.729091  0.        1.139401  1.234826  0.402342  0.684810  0.870797   
8  1.165150  0.900826  0.        1.536244  1.488252  1.895889  1.178780   
9  0.403177  1.222445  0.        0.976639  0.356366  0.706573  0.         

          7         8         9  
0  0.        0.        0.410599  
1  0.        0.        0.854096  
2  0.        1.532779  1.469359  
3  1.202380  0.        0.302303  
4  0.        1.613898  0.212740  
5  0.        0.634322  0.362741  
6  0.        0.907298  0.051945  
7  0.        0.        0.056165  
8  0.        0.        1.054452  
9  1.785870  0.        0.401989

The way I am attempting to solve this feels a bit slow. I loop over the zones, then I get a zone_df and then I loop over row, sorting each row and calling row.head(len(row) - N) to get the index and columns which need to be set to 0. I then use these values (in a dict) to set cells in zone_df to zero and then combine the zone_dfs.

我试图解决这个问题的方式感觉有点慢。我循环遍历区域，然后我得到一个zone_df然后我循环遍历行，排序每一行并调用row.head（len（row） - N）以获取需要设置为0的索引和列。然后使用这些值（在dict中）将zone_df中的单元格设置为零，然后组合zone_dfs。

4 个解决方案

#1

Here's one way -

这是一种方式 -

def keeptopN_perkey(df, zones, N=2):
    a = df.values
    indx = zones.values()
    r = np.arange(a.shape[0])[:,None]
    for i in indx:
        b = a[:,i]
        L = np.maximum(len(i)-N,0)
        if L>0:
            idx = np.argpartition(b, L, axis=1)[:,:L] 
            # or np.argsort(b,axis=1)[:,:L]
            b[r, idx] = 0
        a[:,i] = b
    return df

The benefit is that we are writing back to the input dataframe without the need to create an output dataframe with the help of using the underlying array data.

好处是我们正在回写输入数据帧，而无需在使用底层数组数据的帮助下创建输出数据帧。

Sample run -

样品运行 -

In [303]: np.random.seed(0)
     ...: N = 2
     ...: df = pd.DataFrame(np.random.randint(11,99,(4,10)))
     ...: zones = {"A": [0,1,2], "B": [3,4], "C": [5, 6,7,8], "D": [9]}
     ...: 

In [304]: df
Out[304]: 
    0   1   2   3   4   5   6   7   8   9
0  55  58  75  78  78  20  94  32  47  98
1  81  23  69  76  50  98  57  92  48  36
2  88  83  20  31  91  80  90  58  75  93
3  60  40  30  30  25  50  43  76  20  68

In [305]: keeptopN_perkey(df, zones, N=2)
Out[305]: 
    0   1   2   3   4   5   6   7   8   9
0   0  58  75  78  78   0  94   0  47  98
1  81   0  69  76  50  98   0  92   0  36
2  88  83   0  31  91  80  90   0   0  93
3  60  40   0  30  25  50   0  76   0  68

Benchmarking

Approaches from other posts -

其他职位的方法 -

def mask_n(df, n): # @piRSquared's helper func
    v = np.zeros(df.shape, dtype=bool)
    n = min(n, v.shape[1])
    if v.shape[1] > n:
        j = np.argpartition(-df.values, n, 1)[:, :n].ravel()
        i = np.arange(v.shape[0]).repeat(n)
        v[i, j] = True
        return df.where(v, 0)
    else:
        return df

def piRSquared1(df, zones): # @piRSquared's soln1
    zinv = {v: k for k in zones for v in zones[k]}
    return df.groupby(zinv, 1).apply(mask_n, n=2)

def piRSquared2(df, zones): # @piRSquared's soln2
    zinv = {v: k for k in zones for v in zones[k]}
    return df.mask(df.groupby(zinv, 1).rank(axis=1, method='first', 
                   ascending=False) > 2, 0)

def COLDSPEED1(df, zones): # @COLDSPEED's soln
    for z in zones:                   
        df2 = df.iloc[:, zones[z]]
        df.iloc[:, zones[z]] = \
                np.where(((-df2).rank(axis=1) - 1) >= 2, 0, df2.values)
    return df

def s5s1(df, zones, N=2): # @s5s's soln
    final = []
    for zone_id, cols in zones.iteritems():
        values = {}
        d = df[cols]  # zone A
        for i, row in d.iterrows():
            if len(row) > N:
                row.sort()
                row[row.head(len(row) - N).index] = 0
            values[i] = row
        d = pd.DataFrame(values).T
        final.append(d)

    return pd.concat(final, axis=1)[df.columns]

Timings on a bigger dataset -

关于更大数据集的计时 -

In [458]: # Setup
     ...: ncols = 1000
     ...: cuts = np.sort(np.random.choice(ncols, ncols//3, replace=0))
     ...: indx_split = np.split(np.arange(ncols),cuts)
     ...: zones = {i:p_i for i,p_i in enumerate(list(map(list,indx_split)))}
     ...: df = pd.DataFrame(np.random.randint(11,99,(10,ncols)))
     ...: N = 2
     ...: 
     ...: df1 = df.copy()
     ...: df2 = df.copy()
     ...: df3 = df.copy()
     ...: df4 = df.copy()
     ...: df5 = df.copy()
     ...: 

In [459]: %timeit COLDSPEED1(df1, zones)
     ...: %timeit piRSquared1(df2, zones)
     ...: %timeit piRSquared2(df3, zones)
     ...: %timeit s5s1(df4, zones)
     ...: %timeit keeptopN_perkey(df5, zones)
     ...: 
1 loop, best of 3: 324 ms per loop
10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 81.6 ms per loop
1 loop, best of 3: 1.47 s per loop
100 loops, best of 3: 2.99 ms per loop

#2

Option 1
Using np.argpartition

选项1使用np.argpartition

zinv = {v: k for k in zones for v in zones[k]}

def mask_n(df, n):
    v = np.zeros(df.shape, dtype=bool)
    n = min(n, v.shape[1])
    if v.shape[1] > n:
        j = np.argpartition(-df.values, n, 1)[:, :n].ravel()
        i = np.arange(v.shape[0]).repeat(n)
        v[i, j] = True
        return df.where(v, 0)
    else:
        return df

df.groupby(zinv, 1).apply(mask_n, n=2)

Option 2
Usint rank

选项2 Usint排名

zinv = {v: k for k in zones for v in zones[k]}

df.mask(df.groupby(zinv, 1).rank(axis=1, method='first', ascending=False) > 2, 0)

#3

Given a dataframe subslice:

给定一个数据帧子片段：

df
          0         1         2
0  1.764052  0.400157  0.978738
1  0.144044  1.454274  0.761038
2  2.552990  0.653619  0.864436
3  0.154947  0.378163  0.887786
4  1.048553  1.420018  1.706270
5  0.895467  0.386902  0.510805
6  0.672460  0.359553  0.813146
7  0.729091  0.128983  1.139401
8  1.165150  0.900826  0.465662
9  0.403177  1.222445  0.208275

Apply df.rank and set all values >= N to 0:

应用df.rank并将所有值> = N设置为0：

v = df.values
v = df.iloc[:, zones[z]] = np.where(((-df2)\
                 .rank(axis=1) - 1) >= 2, 0, df2.values)

v
array([[ 1.764052,  0.      ,  0.978738],
       [ 0.      ,  1.454274,  0.761038],
       [ 2.55299 ,  0.      ,  0.864436],
       [ 0.      ,  0.378163,  0.887786],
       [ 0.      ,  1.420018,  1.70627 ],
       [ 0.895467,  0.      ,  0.510805],
       [ 0.67246 ,  0.      ,  0.813146],
       [ 0.729091,  0.      ,  1.139401],
       [ 1.16515 ,  0.900826,  0.      ],
       [ 0.403177,  1.222445,  0.      ]])

Generalising to your dataframe, you have:

推广到您的数据框，您有：

for z in zones:                   
    df2 = df.iloc[:, zones[z]]
    df.iloc[:, zones[z]] = \
            np.where(((-df2).rank(axis=1) - 1) >= 2, 0, df2.values)

df

          0         1         2         3          4         5         6  \
0   1.76405         0  0.978738   2.24089    1.86756  0.977278  0.950088   
1         0   1.45427  0.761038  0.121675   0.443863  0.333674   1.49408   
2   2.55299         0  0.864436  0.742165    2.26975   1.45437         0   
3         0  0.378163  0.887786    1.9808   0.347912         0   1.23029   
4         0   1.42002   1.70627   1.95078   0.509652         0    1.2528   
5  0.895467         0  0.510805   1.18063  0.0281822  0.428332         0   
6   0.67246         0  0.813146   1.72628   0.177426         0    1.6302   
7  0.729091         0    1.1394   1.23483   0.402342   0.68481  0.870797   
8   1.16515  0.900826         0   1.53624    1.48825   1.89589   1.17878   
9  0.403177   1.22245         0  0.976639   0.356366  0.706573         0   

         7         8          9  
0        0         0   0.410599  
1        0         0   0.854096  
2        0   1.53278    1.46936  
3  1.20238         0   0.302303  
4        0    1.6139    0.21274  
5        0  0.634322   0.362741  
6        0  0.907298  0.0519454  
7        0         0  0.0561653  
8        0         0    1.05445  
9  1.78587         0   0.401989

#4

OK, I originally wrote this solution so I'm adding it here as another version.

好吧，我最初编写这个解决方案，所以我在这里添加它作为另一个版本。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(10,10)).abs()
N = 2
zones = {"A": [0,1,2], "B": [3,4], "C": [5,6,7,8], "D": [9]}

final = []
for zone_id, cols in zones.iteritems():
    values = {}
    d = df[cols]  # zone A
    for i, row in d.iterrows():
        if len(row) > N:
            row.sort()
            row[row.head(len(row) - N).index] = 0
        values[i] = row
    d = pd.DataFrame(values).T
    final.append(d)

result = pd.concat(final, axis=1)[df.columns]

Test answer is the same:

测试答案是一样的：

expected = pd.DataFrame({0: [1.764052, 0., 0.978738, 2.240893, 1.867558, 0.977278, 0.950088, 0., 0., 0.410599],
                             1: [0., 1.454274, 0.761038, 0.121675, 0.443863, 0.333674, 1.494079, 0., 0., 0.854096],
                             2: [2.552990, 0., 0.864436, 0.742165, 2.269755, 1.454366, 0., 0., 1.532779, 1.469359],
                             3: [0., 0.378163, 0.887786, 1.980796, 0.347912, 0., 1.230291, 1.202380, 0., 0.302303],
                             4: [0., 1.420018, 1.706270, 1.950775, 0.509652, 0., 1.252795, 0., 1.613898, 0.212740],
                             5: [0.895467, 0., 0.510805, 1.180632, 0.028182, 0.428332, 0., 0., 0.634322, 0.362741],
                             6: [0.672460, 0., 0.813146, 1.726283, 0.177426, 0., 1.630198, 0., 0.907298, 0.051945],
                             7: [0.729091, 0., 1.139401, 1.234826, 0.402342, 0.684810, 0.870797, 0., 0., 0.056165],
                             8: [1.165150, 0.900826, 0., 1.536244, 1.488252, 1.895889, 1.178780, 0., 0., 1.054452],
                             9: [0.403177, 1.222445, 0., 0.976639, 0.356366, 0.706573, 0., 1.785870, 0., 0.401989],
                             }).T

assert (expected - result).abs().sum().sum() < 0.001

#1

Here's one way -

这是一种方式 -

def keeptopN_perkey(df, zones, N=2):
    a = df.values
    indx = zones.values()
    r = np.arange(a.shape[0])[:,None]
    for i in indx:
        b = a[:,i]
        L = np.maximum(len(i)-N,0)
        if L>0:
            idx = np.argpartition(b, L, axis=1)[:,:L] 
            # or np.argsort(b,axis=1)[:,:L]
            b[r, idx] = 0
        a[:,i] = b
    return df

The benefit is that we are writing back to the input dataframe without the need to create an output dataframe with the help of using the underlying array data.

好处是我们正在回写输入数据帧，而无需在使用底层数组数据的帮助下创建输出数据帧。

Sample run -

样品运行 -

In [303]: np.random.seed(0)
     ...: N = 2
     ...: df = pd.DataFrame(np.random.randint(11,99,(4,10)))
     ...: zones = {"A": [0,1,2], "B": [3,4], "C": [5, 6,7,8], "D": [9]}
     ...: 

In [304]: df
Out[304]: 
    0   1   2   3   4   5   6   7   8   9
0  55  58  75  78  78  20  94  32  47  98
1  81  23  69  76  50  98  57  92  48  36
2  88  83  20  31  91  80  90  58  75  93
3  60  40  30  30  25  50  43  76  20  68

In [305]: keeptopN_perkey(df, zones, N=2)
Out[305]: 
    0   1   2   3   4   5   6   7   8   9
0   0  58  75  78  78   0  94   0  47  98
1  81   0  69  76  50  98   0  92   0  36
2  88  83   0  31  91  80  90   0   0  93
3  60  40   0  30  25  50   0  76   0  68

Benchmarking

Approaches from other posts -

其他职位的方法 -

def mask_n(df, n): # @piRSquared's helper func
    v = np.zeros(df.shape, dtype=bool)
    n = min(n, v.shape[1])
    if v.shape[1] > n:
        j = np.argpartition(-df.values, n, 1)[:, :n].ravel()
        i = np.arange(v.shape[0]).repeat(n)
        v[i, j] = True
        return df.where(v, 0)
    else:
        return df

def piRSquared1(df, zones): # @piRSquared's soln1
    zinv = {v: k for k in zones for v in zones[k]}
    return df.groupby(zinv, 1).apply(mask_n, n=2)

def piRSquared2(df, zones): # @piRSquared's soln2
    zinv = {v: k for k in zones for v in zones[k]}
    return df.mask(df.groupby(zinv, 1).rank(axis=1, method='first', 
                   ascending=False) > 2, 0)

def COLDSPEED1(df, zones): # @COLDSPEED's soln
    for z in zones:                   
        df2 = df.iloc[:, zones[z]]
        df.iloc[:, zones[z]] = \
                np.where(((-df2).rank(axis=1) - 1) >= 2, 0, df2.values)
    return df

def s5s1(df, zones, N=2): # @s5s's soln
    final = []
    for zone_id, cols in zones.iteritems():
        values = {}
        d = df[cols]  # zone A
        for i, row in d.iterrows():
            if len(row) > N:
                row.sort()
                row[row.head(len(row) - N).index] = 0
            values[i] = row
        d = pd.DataFrame(values).T
        final.append(d)

    return pd.concat(final, axis=1)[df.columns]

Timings on a bigger dataset -

关于更大数据集的计时 -

In [458]: # Setup
     ...: ncols = 1000
     ...: cuts = np.sort(np.random.choice(ncols, ncols//3, replace=0))
     ...: indx_split = np.split(np.arange(ncols),cuts)
     ...: zones = {i:p_i for i,p_i in enumerate(list(map(list,indx_split)))}
     ...: df = pd.DataFrame(np.random.randint(11,99,(10,ncols)))
     ...: N = 2
     ...: 
     ...: df1 = df.copy()
     ...: df2 = df.copy()
     ...: df3 = df.copy()
     ...: df4 = df.copy()
     ...: df5 = df.copy()
     ...: 

In [459]: %timeit COLDSPEED1(df1, zones)
     ...: %timeit piRSquared1(df2, zones)
     ...: %timeit piRSquared2(df3, zones)
     ...: %timeit s5s1(df4, zones)
     ...: %timeit keeptopN_perkey(df5, zones)
     ...: 
1 loop, best of 3: 324 ms per loop
10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 81.6 ms per loop
1 loop, best of 3: 1.47 s per loop
100 loops, best of 3: 2.99 ms per loop

#2

Option 1
Using np.argpartition

选项1使用np.argpartition

zinv = {v: k for k in zones for v in zones[k]}

def mask_n(df, n):
    v = np.zeros(df.shape, dtype=bool)
    n = min(n, v.shape[1])
    if v.shape[1] > n:
        j = np.argpartition(-df.values, n, 1)[:, :n].ravel()
        i = np.arange(v.shape[0]).repeat(n)
        v[i, j] = True
        return df.where(v, 0)
    else:
        return df

df.groupby(zinv, 1).apply(mask_n, n=2)

Option 2
Usint rank

选项2 Usint排名

zinv = {v: k for k in zones for v in zones[k]}

df.mask(df.groupby(zinv, 1).rank(axis=1, method='first', ascending=False) > 2, 0)

#3

Given a dataframe subslice:

给定一个数据帧子片段：

df
          0         1         2
0  1.764052  0.400157  0.978738
1  0.144044  1.454274  0.761038
2  2.552990  0.653619  0.864436
3  0.154947  0.378163  0.887786
4  1.048553  1.420018  1.706270
5  0.895467  0.386902  0.510805
6  0.672460  0.359553  0.813146
7  0.729091  0.128983  1.139401
8  1.165150  0.900826  0.465662
9  0.403177  1.222445  0.208275

Apply df.rank and set all values >= N to 0:

应用df.rank并将所有值> = N设置为0：

v = df.values
v = df.iloc[:, zones[z]] = np.where(((-df2)\
                 .rank(axis=1) - 1) >= 2, 0, df2.values)

v
array([[ 1.764052,  0.      ,  0.978738],
       [ 0.      ,  1.454274,  0.761038],
       [ 2.55299 ,  0.      ,  0.864436],
       [ 0.      ,  0.378163,  0.887786],
       [ 0.      ,  1.420018,  1.70627 ],
       [ 0.895467,  0.      ,  0.510805],
       [ 0.67246 ,  0.      ,  0.813146],
       [ 0.729091,  0.      ,  1.139401],
       [ 1.16515 ,  0.900826,  0.      ],
       [ 0.403177,  1.222445,  0.      ]])

Generalising to your dataframe, you have:

推广到您的数据框，您有：

for z in zones:                   
    df2 = df.iloc[:, zones[z]]
    df.iloc[:, zones[z]] = \
            np.where(((-df2).rank(axis=1) - 1) >= 2, 0, df2.values)

df

          0         1         2         3          4         5         6  \
0   1.76405         0  0.978738   2.24089    1.86756  0.977278  0.950088   
1         0   1.45427  0.761038  0.121675   0.443863  0.333674   1.49408   
2   2.55299         0  0.864436  0.742165    2.26975   1.45437         0   
3         0  0.378163  0.887786    1.9808   0.347912         0   1.23029   
4         0   1.42002   1.70627   1.95078   0.509652         0    1.2528   
5  0.895467         0  0.510805   1.18063  0.0281822  0.428332         0   
6   0.67246         0  0.813146   1.72628   0.177426         0    1.6302   
7  0.729091         0    1.1394   1.23483   0.402342   0.68481  0.870797   
8   1.16515  0.900826         0   1.53624    1.48825   1.89589   1.17878   
9  0.403177   1.22245         0  0.976639   0.356366  0.706573         0   

         7         8          9  
0        0         0   0.410599  
1        0         0   0.854096  
2        0   1.53278    1.46936  
3  1.20238         0   0.302303  
4        0    1.6139    0.21274  
5        0  0.634322   0.362741  
6        0  0.907298  0.0519454  
7        0         0  0.0561653  
8        0         0    1.05445  
9  1.78587         0   0.401989

#4

OK, I originally wrote this solution so I'm adding it here as another version.

好吧，我最初编写这个解决方案，所以我在这里添加它作为另一个版本。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(10,10)).abs()
N = 2
zones = {"A": [0,1,2], "B": [3,4], "C": [5,6,7,8], "D": [9]}

final = []
for zone_id, cols in zones.iteritems():
    values = {}
    d = df[cols]  # zone A
    for i, row in d.iterrows():
        if len(row) > N:
            row.sort()
            row[row.head(len(row) - N).index] = 0
        values[i] = row
    d = pd.DataFrame(values).T
    final.append(d)

result = pd.concat(final, axis=1)[df.columns]

Test answer is the same:

测试答案是一样的：

expected = pd.DataFrame({0: [1.764052, 0., 0.978738, 2.240893, 1.867558, 0.977278, 0.950088, 0., 0., 0.410599],
                             1: [0., 1.454274, 0.761038, 0.121675, 0.443863, 0.333674, 1.494079, 0., 0., 0.854096],
                             2: [2.552990, 0., 0.864436, 0.742165, 2.269755, 1.454366, 0., 0., 1.532779, 1.469359],
                             3: [0., 0.378163, 0.887786, 1.980796, 0.347912, 0., 1.230291, 1.202380, 0., 0.302303],
                             4: [0., 1.420018, 1.706270, 1.950775, 0.509652, 0., 1.252795, 0., 1.613898, 0.212740],
                             5: [0.895467, 0., 0.510805, 1.180632, 0.028182, 0.428332, 0., 0., 0.634322, 0.362741],
                             6: [0.672460, 0., 0.813146, 1.726283, 0.177426, 0., 1.630198, 0., 0.907298, 0.051945],
                             7: [0.729091, 0., 1.139401, 1.234826, 0.402342, 0.684810, 0.870797, 0., 0., 0.056165],
                             8: [1.165150, 0.900826, 0., 1.536244, 1.488252, 1.895889, 1.178780, 0., 0., 1.054452],
                             9: [0.403177, 1.222445, 0., 0.976639, 0.356366, 0.706573, 0., 1.785870, 0., 0.401989],
                             }).T

assert (expected - result).abs().sum().sum() < 0.001

秒客网

在列索引组中的数据框中保留每行的前N个值

4 个解决方案

#1

Benchmarking

#2

#3

#4

#1

Benchmarking

#2

#3

#4

相关文章