如何根据多个条件从numpy数组中删除行?

时间:2022-07-29 14:14:28

I have a file with 4 columns and thousands of rows. I want to remove rows whose items in the first column are in a certain range. For example, if the data in my file is as following:

我有一个包含4列和数千行的文件。我想删除第一列中的项目在某个范围内的行。例如,如果我的文件中的数据如下:

18  6.215   0.025
19  6.203   0.025
20  6.200   0.025
21  6.205   0.025
22  6.201   0.026
23  6.197   0.026
24  6.188   0.024
25  6.187   0.023
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
30  6.189   0.019
31  6.191   0.018
32  6.188   0.019
33  6.187   0.019
34  6.194   0.021
35  6.192   0.024
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

I want to remove the rows whose first item is between 20 and 25 or between 30 and 35. That means the output I expect is:

我想删除第一项在20到25之间或在30到35之间的行。这意味着我期望的输出是:

18  6.215   0.025
19  6.203   0.025
26  6.189   0.021
27  6.188   0.020
28  6.192   0.019
29  6.185   0.020
36  6.193   0.024
37  6.187   0.026
38  6.184   0.026
39  6.183   0.027
40  6.189   0.027

How could I do this?

我怎么能这样做?

4 个解决方案

#1


7  

If you want to keep using numpy, the solution isn't hard.

如果你想继续使用numpy,解决方案并不难。

data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))]
data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))]

Or if you want to combine it all into one statement,

或者如果你想将它们全部合并为一个语句,

data = data[
    np.logical_not(np.logical_or(
        np.logical_and(data[:,0] > 20, data[:,0] < 25),
        np.logical_and(data[:,0] > 30, data[:,0] < 35)
    ))
]

To explain, conditional statements like data[:,0] < 25 create boolean arrays that track, element-by-element, where the condition in an array is true or false. In this case, it tells you where the first column of data is less than 25.

为了解释,像data [:,0] <25这样的条件语句创建了逐个元素跟踪的布尔数组,其中数组中的条件为true或false。在这种情况下,它会告诉您第一列数据小于25的位置。

You can also index numpy arrays with these boolean arrays. A statement like data[data[:,0] > 30] extracts all the rows where data[:,0] > 30 is true, or all the rows where the first element is greater than 30. This kind of conditional indexing is how you extract the rows (or columns, or elements) that you want.

您还可以使用这些布尔数组索引numpy数组。像data [data [:,0]> 30]这样的语句提取data [:,0]> 30为true的所有行,或者第一个元素大于30的所有行。这种条件索引是如何您提取所需的行(或列或元素)。

Finally, we need logical tools to combine boolean arrays element-by-element. Regular and, or, and not statements don't work because they try to combine the boolean arrays together as a whole. Fortunately, numpy provides a set of these tools for use in the form of np.logical_and, np.logical_or, and np.logical_not. With these, we can combine our boolean arrays element-wise to find rows that satisfy more complicated conditions.

最后,我们需要逻辑工具来逐元素地组合布尔数组。常规和,或者,而不是语句不起作用,因为它们试图将布尔数组作为一个整体组合在一起。幸运的是,numpy提供了一组这些工具,可以以np.logical_and,np.logical_or和np.logical_not的形式使用。有了这些,我们可以组合我们的布尔数组元素来查找满足更复杂条件的行。

#2


2  

In the special but frequent case that the selection criterion is whether a value hits an interval, I use the abs() of the difference to the mid of the interval, especially if midInterval has a physical meaning:

在特殊但频繁的情况下,选择标准是值是否达到一个区间,我使用差值的abs()到区间的中间,特别是如果midInterval具有物理意义:

data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval

If the data type is integer and the mid value is not (as in Jun's request), you could double the values instead of conversion to float (rounding errors become > 1 for huge integers):

如果数据类型是整数且中间值不是(如Jun的请求中),则可以将值加倍而不是转换为float(对于大整数,舍入错误变为> 1):

data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits]

Repeat to remove two intervals. With the limits in Jun's question:

重复以删除两个间隔。在Jun的问题中有限制:

data = data[abs(2*data[:,0] - 45) > 3]
data = data[abs(2*data[:,0] - 65) > 3]

#3


0  

Find below my solution to the problem of deletion specific rows from a numpy array. The solution is provided as one-liner of the form:

在下面找到我对numpy数组中删除特定行的问题的解决方案。解决方案以表格形式提供:

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

and is based on pure numpy functions (np.bitwise_and, np.where, np.delete).

并且基于纯粹的numpy函数(np.bitwise_and,np.where,np.delete)。

A = np.array( [   [ 18, 6.215, 0.025 ],
    [ 19, 6.203, 0.025 ],
    [ 20, 6.200, 0.025 ],
    [ 21, 6.205, 0.025 ],
    [ 22, 6.201, 0.026 ],
    [ 23, 6.197, 0.026 ],
    [ 24, 6.188, 0.024 ],
    [ 25, 6.187, 0.023 ],
    [ 26, 6.189, 0.021 ],
    [ 27, 6.188, 0.020 ],
    [ 28, 6.192, 0.019 ],
    [ 29, 6.185, 0.020 ],
    [ 30, 6.189, 0.019 ],
    [ 31, 6.191, 0.018 ],
    [ 32, 6.188, 0.019 ],
    [ 33, 6.187, 0.019 ],
    [ 34, 6.194, 0.021 ],
    [ 35, 6.192, 0.024 ],
    [ 36, 6.193, 0.024 ],
    [ 37, 6.187, 0.026 ],
    [ 38, 6.184, 0.026 ],
    [ 39, 6.183, 0.027 ],
    [ 40, 6.189, 0.027 ] ] )

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

# Remove the rows whose first item is between 30 and 35
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0)

>>> A
array([[  1.80000000e+01,   6.21500000e+00,   2.50000000e-02],
       [  1.90000000e+01,   6.20300000e+00,   2.50000000e-02],
       [  2.60000000e+01,   6.18900000e+00,   2.10000000e-02],
       [  2.70000000e+01,   6.18800000e+00,   2.00000000e-02],
       [  2.80000000e+01,   6.19200000e+00,   1.90000000e-02],
       [  2.90000000e+01,   6.18500000e+00,   2.00000000e-02],
       [  3.60000000e+01,   6.19300000e+00,   2.40000000e-02],
       [  3.70000000e+01,   6.18700000e+00,   2.60000000e-02],
       [  3.80000000e+01,   6.18400000e+00,   2.60000000e-02],
       [  3.90000000e+01,   6.18300000e+00,   2.70000000e-02],
       [  4.00000000e+01,   6.18900000e+00,   2.70000000e-02]])

#4


-1  

You don't need to add complexity with numpy for this. I'm guessing you're reading your file in into a list of lists here (with each row being a list within the overall data list like this: ((18, 6.215, 0.025), (19, 6.203, 0.025), ...)). In which case use the below rule:

你不需要为numpy添加复杂性。我猜你正在把你的文件读到这里的列表列表中(每行都是整个数据列表中的列表,如下所示:((18,6.215,0.025),(19,6.203,0.025),. ..))。在这种情况下使用以下规则:

for row in data:
    if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)):
        data.remove(row)

#1


7  

If you want to keep using numpy, the solution isn't hard.

如果你想继续使用numpy,解决方案并不难。

data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))]
data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))]

Or if you want to combine it all into one statement,

或者如果你想将它们全部合并为一个语句,

data = data[
    np.logical_not(np.logical_or(
        np.logical_and(data[:,0] > 20, data[:,0] < 25),
        np.logical_and(data[:,0] > 30, data[:,0] < 35)
    ))
]

To explain, conditional statements like data[:,0] < 25 create boolean arrays that track, element-by-element, where the condition in an array is true or false. In this case, it tells you where the first column of data is less than 25.

为了解释,像data [:,0] <25这样的条件语句创建了逐个元素跟踪的布尔数组,其中数组中的条件为true或false。在这种情况下,它会告诉您第一列数据小于25的位置。

You can also index numpy arrays with these boolean arrays. A statement like data[data[:,0] > 30] extracts all the rows where data[:,0] > 30 is true, or all the rows where the first element is greater than 30. This kind of conditional indexing is how you extract the rows (or columns, or elements) that you want.

您还可以使用这些布尔数组索引numpy数组。像data [data [:,0]> 30]这样的语句提取data [:,0]> 30为true的所有行,或者第一个元素大于30的所有行。这种条件索引是如何您提取所需的行(或列或元素)。

Finally, we need logical tools to combine boolean arrays element-by-element. Regular and, or, and not statements don't work because they try to combine the boolean arrays together as a whole. Fortunately, numpy provides a set of these tools for use in the form of np.logical_and, np.logical_or, and np.logical_not. With these, we can combine our boolean arrays element-wise to find rows that satisfy more complicated conditions.

最后,我们需要逻辑工具来逐元素地组合布尔数组。常规和,或者,而不是语句不起作用,因为它们试图将布尔数组作为一个整体组合在一起。幸运的是,numpy提供了一组这些工具,可以以np.logical_and,np.logical_or和np.logical_not的形式使用。有了这些,我们可以组合我们的布尔数组元素来查找满足更复杂条件的行。

#2


2  

In the special but frequent case that the selection criterion is whether a value hits an interval, I use the abs() of the difference to the mid of the interval, especially if midInterval has a physical meaning:

在特殊但频繁的情况下,选择标准是值是否达到一个区间,我使用差值的abs()到区间的中间,特别是如果midInterval具有物理意义:

data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval

If the data type is integer and the mid value is not (as in Jun's request), you could double the values instead of conversion to float (rounding errors become > 1 for huge integers):

如果数据类型是整数且中间值不是(如Jun的请求中),则可以将值加倍而不是转换为float(对于大整数,舍入错误变为> 1):

data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits]

Repeat to remove two intervals. With the limits in Jun's question:

重复以删除两个间隔。在Jun的问题中有限制:

data = data[abs(2*data[:,0] - 45) > 3]
data = data[abs(2*data[:,0] - 65) > 3]

#3


0  

Find below my solution to the problem of deletion specific rows from a numpy array. The solution is provided as one-liner of the form:

在下面找到我对numpy数组中删除特定行的问题的解决方案。解决方案以表格形式提供:

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

and is based on pure numpy functions (np.bitwise_and, np.where, np.delete).

并且基于纯粹的numpy函数(np.bitwise_and,np.where,np.delete)。

A = np.array( [   [ 18, 6.215, 0.025 ],
    [ 19, 6.203, 0.025 ],
    [ 20, 6.200, 0.025 ],
    [ 21, 6.205, 0.025 ],
    [ 22, 6.201, 0.026 ],
    [ 23, 6.197, 0.026 ],
    [ 24, 6.188, 0.024 ],
    [ 25, 6.187, 0.023 ],
    [ 26, 6.189, 0.021 ],
    [ 27, 6.188, 0.020 ],
    [ 28, 6.192, 0.019 ],
    [ 29, 6.185, 0.020 ],
    [ 30, 6.189, 0.019 ],
    [ 31, 6.191, 0.018 ],
    [ 32, 6.188, 0.019 ],
    [ 33, 6.187, 0.019 ],
    [ 34, 6.194, 0.021 ],
    [ 35, 6.192, 0.024 ],
    [ 36, 6.193, 0.024 ],
    [ 37, 6.187, 0.026 ],
    [ 38, 6.184, 0.026 ],
    [ 39, 6.183, 0.027 ],
    [ 40, 6.189, 0.027 ] ] )

#  Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)

# Remove the rows whose first item is between 30 and 35
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0)

>>> A
array([[  1.80000000e+01,   6.21500000e+00,   2.50000000e-02],
       [  1.90000000e+01,   6.20300000e+00,   2.50000000e-02],
       [  2.60000000e+01,   6.18900000e+00,   2.10000000e-02],
       [  2.70000000e+01,   6.18800000e+00,   2.00000000e-02],
       [  2.80000000e+01,   6.19200000e+00,   1.90000000e-02],
       [  2.90000000e+01,   6.18500000e+00,   2.00000000e-02],
       [  3.60000000e+01,   6.19300000e+00,   2.40000000e-02],
       [  3.70000000e+01,   6.18700000e+00,   2.60000000e-02],
       [  3.80000000e+01,   6.18400000e+00,   2.60000000e-02],
       [  3.90000000e+01,   6.18300000e+00,   2.70000000e-02],
       [  4.00000000e+01,   6.18900000e+00,   2.70000000e-02]])

#4


-1  

You don't need to add complexity with numpy for this. I'm guessing you're reading your file in into a list of lists here (with each row being a list within the overall data list like this: ((18, 6.215, 0.025), (19, 6.203, 0.025), ...)). In which case use the below rule:

你不需要为numpy添加复杂性。我猜你正在把你的文件读到这里的列表列表中(每行都是整个数据列表中的列表,如下所示:((18,6.215,0.025),(19,6.203,0.025),. ..))。在这种情况下使用以下规则:

for row in data:
    if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)):
        data.remove(row)