NumPy混合类型的数组/矩阵

I'm trying to create a NumPy array/matrix (Nx3) with mixed data types (string, integer, integer). But when I'm appending this matrix by adding some data, I get an error: TypeError: invalid type promotion. Please, can anybody help me to solve this problem?

我正在尝试使用混合数据类型(字符串,整数,整数)创建NumPy数组/矩阵(Nx3)。但是当我通过添加一些数据来附加这个矩阵时,我收到一个错误:TypeError:无效的类型提升。拜托,有人可以帮我解决这个问题吗?

When I create an array with the sample data, NumPy casts all columns in the matrix to the one 'S' data type. And I can't specify data type for an array, because when i do this res = np.array(["TEXT", 1, 1], dtype='S, i4, i4') - I get an error: TypeError: expected a readable buffer object

当我使用样本数据创建数组时,NumPy将矩阵中的所有列转换为'S'数据类型。而且我不能为数组指定数据类型,因为当我这样做时res = np.array([“TEXT”,1,1],dtype ='S,i4,i4') - 我收到一个错误:TypeError :期望一个可读的缓冲区对象

templates.py

import numpy as np
from pprint import pprint

test_array = np.zeros((0, 3), dtype='S, i4, i4')
pprint(test_array)

test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)
pprint(test_array)

print("Array example:")
res = np.array(["TEXT", 1, 1])
pprint(res)

Output:

array([], shape=(0L, 3L), 
  dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])

 Array example:
 array(['TEXT', '1', '1'], dtype='|S4')

Error:

Traceback (most recent call last):

File "templates.py", line 5, in <module>
test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)

File "lib\site-packages\numpy\lib\function_base.py", line 3543, in append
return concatenate((arr, values), axis=axis)

TypeError: invalid type promotion

5 个解决方案

#1

Your problem is in the data. Try this:

你的问题在于数据。尝试这个:

res = np.array(("TEXT", 1, 1), dtype='|S4, i4, i4')

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='|S4, i4, i4')

The data has to be a tuple or a list of tuples. Not quite evident form the error message, is it?

数据必须是元组或元组列表。从错误信息中不太明显,是吗?

Also, please note that the length of the text field has to be specified for the text data to really be saved. If you want to save the text as objects (only references in the array, then:

另请注意,必须指定文本字段的长度才能真正保存文本数据。如果要将文本另存为对象(仅在数组中引用,则:

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='object, i4, i4')

This is often quite useful, as well.

这通常也非常有用。

#2

If you're not married to numpy, a pandas DataFrame is perfect for this. Alternatively, you can specify the string field in the array as a python object (dtype='O, i4, i4' as an example). Also append seem to like lists of tuples, not lists of lists. I think it has something to do with mutability of lists, not sure.

如果你没有和numpy结婚,那么大熊猫DataFrame就是这样的完美选择。或者,您可以将数组中的字符串字段指定为python对象(dtype ='O,i4,i4'作为示例)。附加似乎也喜欢元组列表,而不是列表列表。我认为它与列表的可变性有关,不确定。

#3

First, numpy stores array elements using fixed physical record sizes. So, record objects need to all be the same physical size. For this reason, you need to tell numpy the size of the string or save a pointer to a string stored somewhere else. In a record array, 'S' translates into a zero-length string, and that's probably not what you intended.

首先,numpy使用固定的物理记录大小存储数组元素。因此,记录对象需要具有相同的物理大小。因此,您需要告诉numpy字符串的大小或保存指向存储在其他位置的字符串的指针。在记录数组中,'S'转换为零长度字符串,这可能不是您想要的。

The append method actually copies the entire array to a larger physical space to accommodate the new elements. Try, for example:

append方法实际上将整个数组复制到更大的物理空间以容纳新元素。试试,例如:

import numpy as np
mtype = 'S10, i4, i4'
ta = np.zeros((0), dtype=mtype)
print id(ta)
ta = np.append(ta, np.array([('first', 10, 11)], dtype=mtype))
print id(ta)
ta = np.append(ta, np.array([('second', 20, 21)], dtype=mtype))
print id(ta)

Each time you append this way, the copy gets slower because you need to allocate and copy more memory each time it grows. That's why the id returns a different value every time you append. If you want any significant number of records in your array, you are much better off either allocating enough space from the start, or else accumulating the data in lists and then collecting the lists into a numpy structured array when you're done. That also gives you the opportunity to make the string length in mtype as short as possible, while still long enough to hold your longest string.

每次以这种方式追加时,副本都会变慢,因为每次增长时都需要分配和复制更多内存。这就是每次追加时id返回不同值的原因。如果你想在你的数组中有大量的记录,你最好从一开始就分配足够的空间,或者在列表中累积数据,然后在完成后将列表收集到一个numpy结构化数组中。这也让你有机会使mtype中的字符串长度尽可能短,同时仍然足够长,以保持最长的字符串。

#4

I think this is what you are trying to accomplish - create an empty array of the desired dtype, and then add one or more data sets to it. The result will have shape (N,), not (N,3).

我认为这是你想要完成的 - 创建一个所需dtype的空数组,然后添加一个或多个数据集。结果将具有形状(N,),而不是(N,3)。

As I noted in a comment, np.append uses np.concatenate, so I am using that too. Also I have to make both test_array and x 1d arrays (shape (0,) and (1,) respectively). And the dtype field is S10, large enough to contain 'TEXT'.

正如我在评论中提到的,np.append使用np.concatenate,所以我也使用它。另外,我必须同时制作test_array和x 1d数组(分别为shape(0,)和(1,))。并且dtype字段是S10,足够大以包含“TEXT”。

In [56]: test_array = np.zeros((0,), dtype='S10, i4, i4')

In [57]: x = np.array([("TEST",1,1)], dtype='S10, i4, i4')

In [58]: test_array = np.concatenate((test_array, x))

In [59]: test_array = np.concatenate((test_array, x))

In [60]: test_array
Out[60]: 
array([('TEST', 1, 1), ('TEST', 1, 1)], 
      dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])

Here's an example of building the array from a list of tuples:

这是从元组列表构建数组的示例:

In [75]: xl=('test',1,1)

In [76]: np.array([xl]*3,dtype='S10,i4,i4')
Out[76]: 
array([('test', 1, 1), ('test', 1, 1), ('test', 1, 1)], 
      dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<i4')])

#5

-2

I don't believe you can make an array out of more than one data type. You can, however, make a list with more than one data type.

我不相信你可以用多种数据类型制作数组。但是,您可以创建包含多种数据类型的列表。

list = ["TEXT", 1, 1]
print(list)

gives

['TEXT', 1, 1]

#1