Efficiently finding range of indices for positive values in 2D numpy array

时间:2022-05-04 21:24:05

I have a large numpy array (typically of order 500,000x1024 but can be larger) and I'm trying to perform a couple of process that depend on where the positive values in the array are. A very small example array might be

我有一个很大的numpy数组(通常是500,000x1024,但可能更大),我正在尝试执行几个依赖于数组中正值的位置的进程。一个非常小的示例数组可能是

  [[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   [ 0., 1., 1., 0., 0., 1., 5., 0., 0.],
   [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
   [ 0., 3., 1., 0., 0., 2., 1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

The first is to replace any zeros between positive values that are less than three columns apart in each row. so if I replace these numbers with 50, my example output would be

第一种是在每行中少于三列的正值之间替换任何零。所以,如果我用50替换这些数字,我的示例输出将是

 [[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
  [ 0., 1., 1.,50.,50., 1., 5., 0., 0.],
  [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
  [ 0., 3., 1.,50.,50., 2., 1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
  [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

The second thing I need to do is to write out some information for each row based on where the ranges of positive values are. For example using my altered array, I need to be able to write out one statement for the third row declaring positive integers for col[1:7] and two statements for the fourth row declaring positive integers in col[1:3] and col[6].

我需要做的第二件事是根据正值范围的位置为每一行写出一些信息。例如,使用我改变的数组,我需要能够为col [1:7]声明正整数的第三行写出一个语句,并且在col [1:3]和col中声明第四行的两个语句声明正整数[6]。

I've managed to utilise the numpy vectorised methods to a point to combat the first task, but have still ended up resorting to looping through both columns and rows (albeit on a subset of the whole array). Otherwise I end up replacing all of the zeros in a given row instead of just those between positive values.

我已经设法利用numpy矢量化方法来对抗第一个任务,但仍然最终需要循环遍历列和行(尽管是整个数组的一个子集)。否则我最终会替换给定行中的所有零而不是正值之间的所有零。

But the second task I can't seem to find a way to do without cycling through the whole array using

但是第二项任务似乎无法在没有循环使用整个阵列的情况下找到方法

for col in arr:
  for row in arr:

I guess my overall question would be, is there a way to make use of the vectorised methods in numpy to define column index ranges that will differ for each row and depend on the values in the following column?

我想我的整体问题是,有没有办法利用numpy中的矢量化方法来定义每个行不同的列索引范围,并依赖于下一列中的值?

Any help would be much appreciated.

任何帮助将非常感激。

4 个解决方案

#1


1  

Numpy unfortunately can't do much processing without generating more arrays, so I fear any solution will require either some form of manual loop like you've been using, or creating one or more additional big arrays. You may be able to come up with a solution that's quite fast and memory efficient using numexpr.

不幸的是,Numpy在没有生成更多数组的情况下无法进行大量处理,因此我担心任何解决方案都需要像您一直使用的某种形式的手动循环,或创建一个或多个其他大数组。您可以使用numexpr提出一个非常快且内存效率高的解决方案。

Here's a stab at doing this in a way that isn't necessarily memory efficient, but at least all the looping will be done by Numpy, so should be a lot faster than what you've been doing as long as it fits in your memory. (Memory efficiency might be improved by rewriting some of this as in-place operations but I won't worry about that.)

这样做的方式不一定是内存效率,但至少所有的循环都是由Numpy完成的,所以应该比你一直做的快得多,只要它适合你的记忆。 。 (通过将其中一些重写为就地操作可以提高内存效率,但我不会担心这一点。)

Here's your step 1:

这是你的第1步:

positive = x>0 # a boolean array marking the positive values in x

positive0 = positive[:,0:-3] # all but last 3 columns 
positive1 = positive[:,1:-2] # all but 1st and last 2 columns; not actually used
positive2 = positive[:,2:-1] # all but first 2 and last 1 columns
positive3 = positive[:,3:  ] # all but first 3 columns

# In the following, the suffix 1 indicates that we're viewing things from the perspective
# of entries in positive1 above.  So, e.g., has_pos_1_to_left1 will be True at
# any position where an entry in positive1 would be preceded by a positive entry in x

has_pos_1_to_left1 = positive0
has_pos_1_or_2_to_right1 = positive2 | positive3
flanked_by_positives1 = has_pos_1_to_left1 & has_pos_1_or_2_to_right1

zeros = (x == 0)       # indicates everywhere x is 0
zeros1 = zeros[:,1:-2] # all but 1st and last 2 columns

x1 = x[:,1:-2]         # all but 1st and last 2 columns

x1[zeros1 & flanked_by_positives1] = 50 # fill in zeros that were flanked - overwrites x!

# The preceding didn't address the next to last column, b/c we couldn't
# look two slots to the right of it without causing error.  Needs special treatment:
x[:,-2][ zeros[:,-2] & positive[:,-1] & (positive[:,-4] or positive[:,-3])] = 50

And here's your step 2:

这是你的第2步:

filled_positives = x>0 # assuming we just filled in x
diffs = numpy.diff(filled_positives) # will be 1 at first positive in any sequence,
                                     # -1 after last positive, zero elsewhere

endings = numpy.where(diffs==-1) # tuple specifying coords where positive sequences end 
                                 # omits final column!!!
beginnings = numpy.where(diffs==1) # tuple specifying coords where pos seqs about to start
                                   # omits column #0!!!

It should be straightforward to use these beginning and ending coordinates to extract the information about each row you said you needed, but remember that this difference-detecting method only catches transitions from non-positive to positive, or vice versa, so it won't mention positive sequences beginning in the zeroth column or ending in the last column, so you'll need to look for those non-transitions separately if you want them.

使用这些开始和结束坐标来提取关于你所需要的每一行的信息应该是直截了当的,但请记住,这种差异检测方法只能捕获从非正转换到正转换,反之亦然,所以它不会提到从第0列开始或在最后一列结束的正序列,因此如果需要,您需要单独查找这些非转换。

#2


0  

You can use efficient numpy iterators like flatiter or nditer

您可以使用高效的numpy迭代器,如flatiter或nditer

For example, for your second task

例如,为您的第二个任务

In [1]: x = array([[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 1.,50.,50., 1., 5., 0., 0.],
   ...:            [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
   ...:            [ 0., 3., 1.,50.,50., 2., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [2]: islands = []
   ...: fl = x.flat
   ...: while fl.index < x.size:
   ...:     coord = fl.coords
   ...:     if fl.next() > 0:
   ...:         length = 1
   ...:         while fl.next() > 0:
   ...:             length +=1
   ...:         islands.append([coord, length])

In [3]: for (row, col), length in islands:
   ...:     print 'row:%d ; col[%d:%d]' %(row, col, col+length)
row:2 ; col[1:7]
row:3 ; col[1:3]
row:3 ; col[6:7]
row:4 ; col[1:7]
row:6 ; col[1:2]
row:6 ; col[5:7]

#3


-1  

For your first problem: create a variable that holds the index of the first positive number you come across and have an if statement that resets the position if the next value is positive and count (variable that counts position away from the first positive number) is less than 3.

对于你的第一个问题:创建一个变量,它保存你遇到的第一个正数的索引,如果下一个值是正数,则有一个if语句重置位置,count(计数位置远离第一个正数)是少于3。

For your second problem: Create an array and add the indices of the locations of positive values.

对于第二个问题:创建一个数组并添加正值位置的索引。

 String[] indices = new String[];
 int pos = 0;
 for col in arr:
     for row in arr:
        if(index is positive){
             indices[pos] = "[" + col + ":" + row + "]";
             pos++;
         }

#4


-1  

The second method would have the data create objects so lets say you have a class:

第二种方法会让数据创建对象,所以假设你有一个类:

public class Matrix{
   int indicex;
   int indicey;
   double val;
   boolean positiveInt;

   //default constructor
   public Matrix(int indicex, int indicey, double val, boolean positiveInt){
   this.indicex = indicex;
   this.indicey = indicey;
   this.val = val;
   this.positiveInt = positiveInt;
   }    

   //getter
   public boolean isPositive(){
        if(positiveInt == true){
              return true;
        }else{
            return false;
        }

and then in your driver class you would have have your data being read and create an object new Matrix(indexx, indexy, val, true/false)....and that would be put into an arraylist that you could search on for positive numbers.

然后在你的驱动程序类中你将有你的数据被读取并创建一个对象新的Matrix(indexx,indexy,val,true / false)....并且这将被放入一个你可以搜索为正面的arraylist数字。

List<Matrix> storeObjects = new ArrayList<Matrix>();
some method(){
   Matrix matrixObject = new Matrix(indexx, indexy, val, trueOrFalse);
   storeObjects.add(matrixObject)
 }

 for every object in store objects 
    if(object.isPositive()){
         put object in a separate array of positive objects
     }
  }

#1


1  

Numpy unfortunately can't do much processing without generating more arrays, so I fear any solution will require either some form of manual loop like you've been using, or creating one or more additional big arrays. You may be able to come up with a solution that's quite fast and memory efficient using numexpr.

不幸的是,Numpy在没有生成更多数组的情况下无法进行大量处理,因此我担心任何解决方案都需要像您一直使用的某种形式的手动循环,或创建一个或多个其他大数组。您可以使用numexpr提出一个非常快且内存效率高的解决方案。

Here's a stab at doing this in a way that isn't necessarily memory efficient, but at least all the looping will be done by Numpy, so should be a lot faster than what you've been doing as long as it fits in your memory. (Memory efficiency might be improved by rewriting some of this as in-place operations but I won't worry about that.)

这样做的方式不一定是内存效率,但至少所有的循环都是由Numpy完成的,所以应该比你一直做的快得多,只要它适合你的记忆。 。 (通过将其中一些重写为就地操作可以提高内存效率,但我不会担心这一点。)

Here's your step 1:

这是你的第1步:

positive = x>0 # a boolean array marking the positive values in x

positive0 = positive[:,0:-3] # all but last 3 columns 
positive1 = positive[:,1:-2] # all but 1st and last 2 columns; not actually used
positive2 = positive[:,2:-1] # all but first 2 and last 1 columns
positive3 = positive[:,3:  ] # all but first 3 columns

# In the following, the suffix 1 indicates that we're viewing things from the perspective
# of entries in positive1 above.  So, e.g., has_pos_1_to_left1 will be True at
# any position where an entry in positive1 would be preceded by a positive entry in x

has_pos_1_to_left1 = positive0
has_pos_1_or_2_to_right1 = positive2 | positive3
flanked_by_positives1 = has_pos_1_to_left1 & has_pos_1_or_2_to_right1

zeros = (x == 0)       # indicates everywhere x is 0
zeros1 = zeros[:,1:-2] # all but 1st and last 2 columns

x1 = x[:,1:-2]         # all but 1st and last 2 columns

x1[zeros1 & flanked_by_positives1] = 50 # fill in zeros that were flanked - overwrites x!

# The preceding didn't address the next to last column, b/c we couldn't
# look two slots to the right of it without causing error.  Needs special treatment:
x[:,-2][ zeros[:,-2] & positive[:,-1] & (positive[:,-4] or positive[:,-3])] = 50

And here's your step 2:

这是你的第2步:

filled_positives = x>0 # assuming we just filled in x
diffs = numpy.diff(filled_positives) # will be 1 at first positive in any sequence,
                                     # -1 after last positive, zero elsewhere

endings = numpy.where(diffs==-1) # tuple specifying coords where positive sequences end 
                                 # omits final column!!!
beginnings = numpy.where(diffs==1) # tuple specifying coords where pos seqs about to start
                                   # omits column #0!!!

It should be straightforward to use these beginning and ending coordinates to extract the information about each row you said you needed, but remember that this difference-detecting method only catches transitions from non-positive to positive, or vice versa, so it won't mention positive sequences beginning in the zeroth column or ending in the last column, so you'll need to look for those non-transitions separately if you want them.

使用这些开始和结束坐标来提取关于你所需要的每一行的信息应该是直截了当的,但请记住,这种差异检测方法只能捕获从非正转换到正转换,反之亦然,所以它不会提到从第0列开始或在最后一列结束的正序列,因此如果需要,您需要单独查找这些非转换。

#2


0  

You can use efficient numpy iterators like flatiter or nditer

您可以使用高效的numpy迭代器,如flatiter或nditer

For example, for your second task

例如,为您的第二个任务

In [1]: x = array([[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 1.,50.,50., 1., 5., 0., 0.],
   ...:            [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
   ...:            [ 0., 3., 1.,50.,50., 2., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [2]: islands = []
   ...: fl = x.flat
   ...: while fl.index < x.size:
   ...:     coord = fl.coords
   ...:     if fl.next() > 0:
   ...:         length = 1
   ...:         while fl.next() > 0:
   ...:             length +=1
   ...:         islands.append([coord, length])

In [3]: for (row, col), length in islands:
   ...:     print 'row:%d ; col[%d:%d]' %(row, col, col+length)
row:2 ; col[1:7]
row:3 ; col[1:3]
row:3 ; col[6:7]
row:4 ; col[1:7]
row:6 ; col[1:2]
row:6 ; col[5:7]

#3


-1  

For your first problem: create a variable that holds the index of the first positive number you come across and have an if statement that resets the position if the next value is positive and count (variable that counts position away from the first positive number) is less than 3.

对于你的第一个问题:创建一个变量,它保存你遇到的第一个正数的索引,如果下一个值是正数,则有一个if语句重置位置,count(计数位置远离第一个正数)是少于3。

For your second problem: Create an array and add the indices of the locations of positive values.

对于第二个问题:创建一个数组并添加正值位置的索引。

 String[] indices = new String[];
 int pos = 0;
 for col in arr:
     for row in arr:
        if(index is positive){
             indices[pos] = "[" + col + ":" + row + "]";
             pos++;
         }

#4


-1  

The second method would have the data create objects so lets say you have a class:

第二种方法会让数据创建对象,所以假设你有一个类:

public class Matrix{
   int indicex;
   int indicey;
   double val;
   boolean positiveInt;

   //default constructor
   public Matrix(int indicex, int indicey, double val, boolean positiveInt){
   this.indicex = indicex;
   this.indicey = indicey;
   this.val = val;
   this.positiveInt = positiveInt;
   }    

   //getter
   public boolean isPositive(){
        if(positiveInt == true){
              return true;
        }else{
            return false;
        }

and then in your driver class you would have have your data being read and create an object new Matrix(indexx, indexy, val, true/false)....and that would be put into an arraylist that you could search on for positive numbers.

然后在你的驱动程序类中你将有你的数据被读取并创建一个对象新的Matrix(indexx,indexy,val,true / false)....并且这将被放入一个你可以搜索为正面的arraylist数字。

List<Matrix> storeObjects = new ArrayList<Matrix>();
some method(){
   Matrix matrixObject = new Matrix(indexx, indexy, val, trueOrFalse);
   storeObjects.add(matrixObject)
 }

 for every object in store objects 
    if(object.isPositive()){
         put object in a separate array of positive objects
     }
  }