Numpy比较两个数组的形状，如果不同，附加0以匹配形状

I am comparing 2 numpy arrays, and want to add them together. but, before doing so, i need to make sure they are the same size. If the size are not same, then take the smaller sized one and fill the last rows with zero to match the shape. Both array have 16 columns and N rows. I am assuming it should be pretty straight forward, but I can't get my head around it. So far I am able to compare the 2 array shape.

我正在比较两个numpy数组，并想把它们加在一起。但是，在此之前，我需要确保它们的大小相同。如果大小不相同，则取较小的行并将最后一行填充为0以匹配形状。两个数组都有16列和N行。我假设它应该是非常直接的，但是我不能把头转过来。到目前为止，我可以比较两个数组的形状。

import csv
import numpy as np
import sys
data = np.genfromtxt('./test1.csv', dtype=float, delimiter=',')
data_sys = np.genfromtxt('./test2.csv', dtype=float, delimiter=',')   
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
        print "we have an error"

This is the output I got:

这是我得到的输出:

=============New file.csv============
(603, 16)
(604, 16)
we have an error

I want the fill the last row of "data" array with 0 so that I can add the 2 arrays. Thanks for your help.

我想用0填充“data”数组的最后一行，这样我就可以添加这两个数组。谢谢你的帮助。

6 个解决方案

#1

You can use vstack(array1, array2) from numpy which stacks arrays vertically. For example:

您可以使用来自numpy的vstack(array1, array2)，它可以垂直地堆叠数组。例如:

A = np.random.randint(2, size = (2, 16))
B = np.random.randint(2, size = (5, 16))

print A.shape
print B.shape
if A.shape[0] < B.shape[0]:
    A = np.vstack((A, np.zeros((B.shape[0] - A.shape[0], 16))))
elif A.shape[0] > B.shape[0]:
    B = np.vstack((B, np.zeros((A.shape[0] - B.shape[0], 16)))) 

print A.shape   
print A

In your case:

在你的例子:

if data.shape[0] < data_sys.shape[0]:
    data = np.vstack((data, np.zeros((data_sys.shape[0] - data.shape[0], 16))))
elif data.shape[0] > data_sys.shape[0]:
    data_sys = np.vstack((data_sys, np.zeros((data.shape[0] - data_sys.shape[0], 16))))

I assume that your matrices have always the same number of columns, if not you can similarly use hstack to stack them horizontally.

我假设你的矩阵总是有相同数量的列，如果不是，你也可以使用hstack来水平地堆叠它们。

#2

If you have only two files, and their shapes differ in just the 0th dimension, a simple check and copy is probably easiest, though it lacks generality:

如果您只有两个文件，并且它们的形状仅在第0维度上不同，那么简单的检查和复制可能是最容易的，尽管它缺乏通用性:

import numpy as np

data = np.genfromtxt('./test1.csv', dtype=float, delimiter=',')
data_sys = np.genfromtxt('./test2.csv', dtype=float, delimiter=',')   

fill_value = 0 # could be np.nan or something else instead

if data.shape[0]>data_sys.shape[0]:
    temp = data_sys
    data_sys = np.ones(data.shape)*fill_value
    data_sys[:temp.shape[0],:] = temp
elif data.shape[0]<data_sys.shape[0]:
    temp = data
    data = np.ones(data_sys.shape)*fill_value
    data[:temp.shape[0],:] = temp

print 'Using conditional:'
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
        print "we have an error"

A much more general solution is a custom class--overkill for your two files but much easier if you have lots of files to handle. The basic idea is that static class variables sx and sy keep track of the largest widths and heights, and are used when get_data is called, to output a standard shape array. This is pre-filled with your desired fill value, and the actual data from the corresponding file are copied into the upper left corner of the standard shape array:

一个更通用的解决方案是一个自定义类——对于两个文件来说是超杀的，但是如果有很多文件要处理，就会容易得多。基本思想是静态类变量sx和sy跟踪最大的宽度和高度，并在调用get_data时使用它们来输出标准的形状数组。这是预填充您想要的填充值，来自相应文件的实际数据被复制到标准形状数组的左上角:

import numpy as np

class IsomorphicArray:

    sy = 0 # static class variable
    sx = 0 # static class variable
    fill_value = 0.0

    def __init__(self,csv_filename):
        self.data = np.genfromtxt(csv_filename,dtype=float,delimiter=',')
        self.instance_sy,self.instance_sx = self.data.shape
        if self.instance_sy>IsomorphicArray.sy:
            IsomorphicArray.sy = self.instance_sy
        if self.instance_sx>IsomorphicArray.sx:
            IsomorphicArray.sx = self.instance_sx

    def get_data(self):
        out = np.ones((IsomorphicArray.sy,IsomorphicArray.sx))*self.fill_value
        out[:self.instance_sy,:self.instance_sx] = self.data
        return out

isomorphic_array_list = []

for filename in ['./test1.csv','./test2.csv']:
    isomorphic_array_list.append(IsomorphicArray(filename))

numpy_array_list = []

for isomorphic_array in isomorphic_array_list:
    numpy_array_list.append(isomorphic_array.get_data())


print 'Using custom class:'
for numpy_array in numpy_array_list:
    print numpy_array.shape

#3

Assuming both arrays have 16 columns

假设两个数组都有16列

len1=len(data)
len2=len(data_sys)
if len1<len2:
  data=np.append(data, np.zeros((len2-len1, 16)),axis=0)
elif len2<len1:
  data_sys=np.append(data_sys, np.zeros((len1-len2, 16)),axis=0)
print data.shape
print data_sys.shape
if data.shape != data_sys.shape:
  print "we have an error"
else:
  print "we r good"

#4

Numpy provides an append function to add values to an array: see here for details. In multi-dimensional arrays you can define how the values should be added. As you have already the information which of your arrays is the smaller one, just add the desired number of zeroes with creating a zero filled array first by numpy.zeroes and then append it to your target array.

Numpy提供了一个附加函数来向数组添加值:详情请参见这里。在多维数组中，可以定义如何添加值。既然已经有了数组中哪个更小的信息，只需添加所需的0个数，并首先通过numpy创建一个充满0的数组。0，然后将其附加到目标数组中。

It might be necessary to flatten your array first and then to reshape it.

可能需要先将数组变平，然后再对其进行重构。

#5

I had a similar situation. Two arrays of sizes mask_in:(n1,m1) and mask_ot:(n2,m2)that were generated through a mask of a 2D image of size (N,M) where A2 is larger than A1 and both share a common center (X0,Y0). I followed the approach suggested by @AniaG using vstack and hstack. I simply obtained the shapes of both arrays, size difference and finally account the number of missing elements at both ends. Here is what I got:

我也有类似的情况。mask_in:(n1,m1)和mask_ot:(n2,m2)，通过一个二维图像大小(N,M)的掩码生成，其中A2大于A1，两者共享一个公共中心(X0,Y0)。我采用了@AniaG使用vstack和hstack的方法。我简单地得到了两个数组的形状、大小差异，最后计算了两端缺失元素的数量。以下是我得到的:

mask_in = np.random.randint(2, size = (2, 8))
mask_ot = np.random.randint(2, size = (6, 16))
mask_in_amp = mask_in

dif_row = mask_ot.shape[0]-mask_in_amp.shape[0]
dif_col = mask_ot.shape[1]-mask_in_amp.shape[1]

complete_row = dif_row / 2
complete_col = dif_col / 2

mask_in_amp = np.vstack((mask_in_amp, np.zeros((complete_row, mask_in_amp.shape[1]))))
mask_in_amp = np.vstack((np.zeros((complete_row, mask_in_amp.data.shape[1])), mask_in_amp))

mask_in_amp = np.hstack((mask_in_amp, np.zeros((mask_in_amp.shape[0],complete_col))))
mask_in_amp = np.hstack((np.zeros((mask_in_amp.shape[0],complete_col)), mask_in_amp))

#6

If you don't care about the exact shapes of two arrays you can also do the following:

如果你不关心两个数组的确切形状，你也可以做以下事情:

if data.size == datasys.size:
    print ('arrays have the same number of elements, and possibly shape')
else:
    print ('arrays do not have the same shape for sure')

#1