将数据列添加到只有一行的numpy rec数组中

时间:2022-11-27 15:43:55

I need to add a column of data to a numpy rec array. I have seen many answers floating around here, but they do not seem to work for a rec array that only contains one row...

我需要将一列数据添加到numpy rec数组中。我看到很多答案在这里浮动,但它们似乎不适用于只包含一行的rec数组......

Let's say I have a rec array x:


>>> x = np.rec.array([1, 2, 3])
>>> print(x)
rec.array((1, 2, 3), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])

and I want to append the value 4 to a new column with it's own field name and data type, such as


 rec.array((1, 2, 3, 4), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

If I try to add a column using the normal append_fields approach;


>>> np.lib.recfunctions.append_fields(x, 'f3', 4, dtypes='<i8', 
usemask=False, asrecarray=True)

then I ultimately end up with


TypeError: len() of unsized object

It turns out that for a rec array with only one row, len(x) does not work, while x.size does. If I instead use np.hstack(), I get TypeError: invalid type promotion, and if I try np.c_, I get an undesired result


>>> np.c_[x, 4]
array([[(1, 2, 3), (4, 4, 4)]], 
  dtype=(numpy.record, [('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]))

2 个解决方案



Create the initial array so that it has shape (1,); note the extra brackets:


In [17]: x = np.rec.array([[1, 2, 3]])

(If x is an input that you can't control that way, you could use x = np.atleast_1d(x) before using it in append_fields().)

(如果x是您无法控制的输入,则可以在append_fields()中使用x = np.atleast_1d(x)之​​前使用它。)

Then make sure the value given in append_fields is also a sequence of length 1:


In [18]: np.lib.recfunctions.append_fields(x, 'f3', [4], dtypes='<i8', 
    ...: usemask=False, asrecarray=True)
rec.array([(1, 2, 3, 4)], 
          dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])



Here's a way of doing the job without a recfunctions:


In [64]: x = np.rec.array((1, 2, 3))
In [65]: y=np.zeros(x.shape, dtype=x.dtype.descr+[('f3','<i4')])
In [66]: y
array((0, 0, 0, 0), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [67]: for name in x.dtype.names: y[name] = x[name]
In [68]: y['f3']=4
In [69]: y
array((1, 2, 3, 4), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])

From what I've seen in recfunctions code, I think it's just as fast. Of course for a single row speed isn't an issue. In general those functions create a new 'blank' array with the target dtype, and copy fields, by name (possibly recursively) from sources to target. Usually an array has many more records than fields, so iteration on fields is not, relatively speaking, slow.




