如何创建具有不同形状的多个字段的numpy结构化数组?

时间:2021-11-19 13:41:40

I'm new to working with numpy arrays and I'm having trouble creating a structured array. I'd like to create something similar to a Matlab structure where the fields can be arrays of different shapes.

我刚开始使用numpy数组而且我在创建结构化数组时遇到了麻烦。我想创建类似于Matlab结构的东西,其中字段可以是不同形状的数组。

a=numpy.array([1, 2, 3, 4, 5, 6,]);
b=numpy.array([7,8,9]);
c=numpy.array([10,11,12,13,14,15,16,17,18,19,20]);

##Doesn't do what I want
data=numpy.array([a, b, c],dtype=[('a','f8'),('b','f8'),('c','f8')]);  

I'd like data['a'] to return matrix a, data['b'] to return matrix b, etc. When reading in a Matlab structure, the data is saved in this format so I know it must be possible.

我希望数据['a']返回矩阵a,数据['b']返回矩阵b等。当在Matlab结构中读取时,数据以这种格式保存,所以我知道它必须是可能的。

2 个解决方案

#1


6  

In python a dictionary is roughly analogs to a structure in Matlab. You could try the following to see if it works for you:

在python中,字典大致类似于Matlab中的结构。您可以尝试以下方法来查看它是否适合您:

>>> data = {'a':a, 'b':b, 'c':c}
>>> data['a'] is a
True

#2


13  

I'm afraid it's not possible without twisting NumPy's arm a lot.

我担心如果不扭动NumPy的手臂就不可能。

See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int, float...) or more complicated ([('',int),('',float),('',"|S10")]), but in any case, all the elements have the same type. That permits some very efficient memory layout.

看,NumPy背后的想法是提供同类数组,即所有具有相同类型的元素数组。这种类型可以是简单的(int,float ...)或更复杂的([('',int),('',float),('',“| S10”)]),但无论如何,所有元素具有相同的类型。这允许一些非常有效的内存布局。

So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:

因此,固有地,结构化数组要求字段(各个子块)具有相同的大小,无论位置如何。检查以下内容:

>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])

It defines an array with three elements; each element is composed of two sub-blocks, a and b; a is a block of three ints, b a block of five floats. But once you define the initial size of the blocks in the dtype, you're stuck with that (well, you can always switch, but that's another story).

它定义了一个包含三个元素的数组;每个元素由两个子块a和b组成; a是一个由三个整数组成的块,一个五个浮点块。但是一旦你在dtype中定义了块的初始大小,你就会坚持下去(好吧,你总是可以切换,但这是另一个故事)。

There's a workaround: using a dtype=object. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:

有一个解决方法:使用dtype = object。这样,您就构建了一个异构项目数组,就像一系列不同大小的列表一样。但是你会失去很多NumPy的力量。还是一个例子:

>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]

So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:

所以,我们刚刚构造了一个......对象数组。我在某个地方放了一个列表,在其他地方放了一个字符串,它有效。您可以按照相同的示例来构建您想要的数组:

blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])

but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.

但是,你真的应该三思而后行,这是否真的是你的目的,另一个结构可能会更有效率。

A side note: please pay attention to the [(a,b,c)] part of the expression above: notice the ()? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c), each sub-element being an object. If you don't put the (), NumPy will whine a lot.

附注:请注意上面表达式的[(a,b,c)]部分:注意()?你基本上是在告诉NumPy构造一个由3个子元素组成的1个元素的数组(每个子元素对应一个,b,c),每个子元素都是一个对象。如果你不放(),NumPy会抱怨很多。

And a last comment: if you access your fields like blob['a'], you'll get an array of size (1,) and dtype=object: just use blob['a'].item() to get back your original (6,) int array.

最后一条评论:如果你访问像blob ['a']这样的字段,你将得到一个大小(1,)和dtype = object的数组:只需使用blob ['a']。item()来获取你的原始(6,)int数组。

#1


6  

In python a dictionary is roughly analogs to a structure in Matlab. You could try the following to see if it works for you:

在python中,字典大致类似于Matlab中的结构。您可以尝试以下方法来查看它是否适合您:

>>> data = {'a':a, 'b':b, 'c':c}
>>> data['a'] is a
True

#2


13  

I'm afraid it's not possible without twisting NumPy's arm a lot.

我担心如果不扭动NumPy的手臂就不可能。

See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int, float...) or more complicated ([('',int),('',float),('',"|S10")]), but in any case, all the elements have the same type. That permits some very efficient memory layout.

看,NumPy背后的想法是提供同类数组,即所有具有相同类型的元素数组。这种类型可以是简单的(int,float ...)或更复杂的([('',int),('',float),('',“| S10”)]),但无论如何,所有元素具有相同的类型。这允许一些非常有效的内存布局。

So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:

因此,固有地,结构化数组要求字段(各个子块)具有相同的大小,无论位置如何。检查以下内容:

>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])

It defines an array with three elements; each element is composed of two sub-blocks, a and b; a is a block of three ints, b a block of five floats. But once you define the initial size of the blocks in the dtype, you're stuck with that (well, you can always switch, but that's another story).

它定义了一个包含三个元素的数组;每个元素由两个子块a和b组成; a是一个由三个整数组成的块,一个五个浮点块。但是一旦你在dtype中定义了块的初始大小,你就会坚持下去(好吧,你总是可以切换,但这是另一个故事)。

There's a workaround: using a dtype=object. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:

有一个解决方法:使用dtype = object。这样,您就构建了一个异构项目数组,就像一系列不同大小的列表一样。但是你会失去很多NumPy的力量。还是一个例子:

>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]

So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:

所以,我们刚刚构造了一个......对象数组。我在某个地方放了一个列表,在其他地方放了一个字符串,它有效。您可以按照相同的示例来构建您想要的数组:

blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])

but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.

但是,你真的应该三思而后行,这是否真的是你的目的,另一个结构可能会更有效率。

A side note: please pay attention to the [(a,b,c)] part of the expression above: notice the ()? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c), each sub-element being an object. If you don't put the (), NumPy will whine a lot.

附注:请注意上面表达式的[(a,b,c)]部分:注意()?你基本上是在告诉NumPy构造一个由3个子元素组成的1个元素的数组(每个子元素对应一个,b,c),每个子元素都是一个对象。如果你不放(),NumPy会抱怨很多。

And a last comment: if you access your fields like blob['a'], you'll get an array of size (1,) and dtype=object: just use blob['a'].item() to get back your original (6,) int array.

最后一条评论:如果你访问像blob ['a']这样的字段,你将得到一个大小(1,)和dtype = object的数组:只需使用blob ['a']。item()来获取你的原始(6,)int数组。