将数据列添加到只有一行的numpy rec数组中

I need to add a column of data to a numpy rec array. I have seen many answers floating around here, but they do not seem to work for a rec array that only contains one row...

我需要将一列数据添加到numpy rec数组中。我看到很多答案在这里浮动,但它们似乎不适用于只包含一行的rec数组......

Let's say I have a rec array x:

假设我有一个rec数组x:

>>> x = np.rec.array([1, 2, 3])
>>> print(x)
rec.array((1, 2, 3), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])

and I want to append the value 4 to a new column with it's own field name and data type, such as

并且我想将值4附加到具有自己的字段名称和数据类型的新列,例如

 rec.array((1, 2, 3, 4), 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

If I try to add a column using the normal append_fields approach;

如果我尝试使用普通的append_fields方法添加列;

>>> np.lib.recfunctions.append_fields(x, 'f3', 4, dtypes='<i8', 
usemask=False, asrecarray=True)

then I ultimately end up with

然后我最终结束了

TypeError: len() of unsized object

It turns out that for a rec array with only one row, len(x) does not work, while x.size does. If I instead use np.hstack(), I get TypeError: invalid type promotion, and if I try np.c_, I get an undesired result

事实证明,对于只有一行的rec数组,len(x)不起作用,而x.size则不起作用。如果我改为使用np.hstack(),我会得到TypeError:无效的类型提升,如果我尝试np.c_,我会得到一个不希望的结果

>>> np.c_[x, 4]
array([[(1, 2, 3), (4, 4, 4)]], 
  dtype=(numpy.record, [('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]))

2 个解决方案

#1

Create the initial array so that it has shape (1,); note the extra brackets:

创建初始数组,使其具有形状(1,);注意额外的括号:

In [17]: x = np.rec.array([[1, 2, 3]])

(If x is an input that you can't control that way, you could use x = np.atleast_1d(x) before using it in append_fields().)

(如果x是您无法控制的输入,则可以在append_fields()中使用x = np.atleast_1d(x)之前使用它。)

Then make sure the value given in append_fields is also a sequence of length 1:

然后确保append_fields中给出的值也是长度为1的序列:

In [18]: np.lib.recfunctions.append_fields(x, 'f3', [4], dtypes='<i8', 
    ...: usemask=False, asrecarray=True)
Out[18]: 
rec.array([(1, 2, 3, 4)], 
          dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

#2

Here's a way of doing the job without a recfunctions:

这是一种在没有重复功能的情况下完成工作的方法:

In [64]: x = np.rec.array((1, 2, 3))
In [65]: y=np.zeros(x.shape, dtype=x.dtype.descr+[('f3','<i4')])
In [66]: y
Out[66]: 
array((0, 0, 0, 0), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [67]: for name in x.dtype.names: y[name] = x[name]
In [68]: y['f3']=4
In [69]: y
Out[69]: 
array((1, 2, 3, 4), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])

From what I've seen in recfunctions code, I think it's just as fast. Of course for a single row speed isn't an issue. In general those functions create a new 'blank' array with the target dtype, and copy fields, by name (possibly recursively) from sources to target. Usually an array has many more records than fields, so iteration on fields is not, relatively speaking, slow.

从我在recfunctions代码中看到的,我认为它同样快。当然,单排速度不是问题。通常,这些函数使用目标dtype创建一个新的“空白”数组,并按名称(可能递归地)从源到目标复制字段。通常,数组比字段具有更多的记录,因此相对来说,字段上的迭代不是很慢。

#1