I'm new to working with numpy arrays and I'm having trouble creating a structured array. I'd like to create something similar to a Matlab structure where the fields can be arrays of different shapes.
我刚开始使用numpy数组而且我在创建结构化数组时遇到了麻烦。我想创建类似于Matlab结构的东西,其中字段可以是不同形状的数组。
a=numpy.array([1, 2, 3, 4, 5, 6,]);
b=numpy.array([7,8,9]);
c=numpy.array([10,11,12,13,14,15,16,17,18,19,20]);
##Doesn't do what I want
data=numpy.array([a, b, c],dtype=[('a','f8'),('b','f8'),('c','f8')]);
I'd like data['a']
to return matrix a, data['b']
to return matrix b, etc. When reading in a Matlab structure, the data is saved in this format so I know it must be possible.
我希望数据['a']返回矩阵a,数据['b']返回矩阵b等。当在Matlab结构中读取时,数据以这种格式保存,所以我知道它必须是可能的。
2 个解决方案
#1
6
In python a dictionary is roughly analogs to a structure in Matlab. You could try the following to see if it works for you:
在python中,字典大致类似于Matlab中的结构。您可以尝试以下方法来查看它是否适合您:
>>> data = {'a':a, 'b':b, 'c':c}
>>> data['a'] is a
True
#2
13
I'm afraid it's not possible without twisting NumPy's arm a lot.
我担心如果不扭动NumPy的手臂就不可能。
See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int
, float
...) or more complicated ([('',int),('',float),('',"|S10")])
, but in any case, all the elements have the same type. That permits some very efficient memory layout.
看,NumPy背后的想法是提供同类数组,即所有具有相同类型的元素数组。这种类型可以是简单的(int,float ...)或更复杂的([('',int),('',float),('',“| S10”)]),但无论如何,所有元素具有相同的类型。这允许一些非常有效的内存布局。
So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:
因此,固有地,结构化数组要求字段(各个子块)具有相同的大小,无论位置如何。检查以下内容:
>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])
It defines an array with three elements; each element is composed of two sub-blocks, a
and b
; a
is a block of three ints
, b
a block of five floats
. But once you define the initial size of the blocks in the dtype
, you're stuck with that (well, you can always switch, but that's another story).
它定义了一个包含三个元素的数组;每个元素由两个子块a和b组成; a是一个由三个整数组成的块,一个五个浮点块。但是一旦你在dtype中定义了块的初始大小,你就会坚持下去(好吧,你总是可以切换,但这是另一个故事)。
There's a workaround: using a dtype=object
. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:
有一个解决方法:使用dtype = object。这样,您就构建了一个异构项目数组,就像一系列不同大小的列表一样。但是你会失去很多NumPy的力量。还是一个例子:
>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]
So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:
所以,我们刚刚构造了一个......对象数组。我在某个地方放了一个列表,在其他地方放了一个字符串,它有效。您可以按照相同的示例来构建您想要的数组:
blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])
but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.
但是,你真的应该三思而后行,这是否真的是你的目的,另一个结构可能会更有效率。
A side note: please pay attention to the [(a,b,c)]
part of the expression above: notice the ()
? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c
), each sub-element being an object. If you don't put the ()
, NumPy will whine a lot.
附注:请注意上面表达式的[(a,b,c)]部分:注意()?你基本上是在告诉NumPy构造一个由3个子元素组成的1个元素的数组(每个子元素对应一个,b,c),每个子元素都是一个对象。如果你不放(),NumPy会抱怨很多。
And a last comment: if you access your fields like blob['a']
, you'll get an array of size (1,)
and dtype=object
: just use blob['a'].item()
to get back your original (6,)
int
array.
最后一条评论:如果你访问像blob ['a']这样的字段,你将得到一个大小(1,)和dtype = object的数组:只需使用blob ['a']。item()来获取你的原始(6,)int数组。
#1
6
In python a dictionary is roughly analogs to a structure in Matlab. You could try the following to see if it works for you:
在python中,字典大致类似于Matlab中的结构。您可以尝试以下方法来查看它是否适合您:
>>> data = {'a':a, 'b':b, 'c':c}
>>> data['a'] is a
True
#2
13
I'm afraid it's not possible without twisting NumPy's arm a lot.
我担心如果不扭动NumPy的手臂就不可能。
See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int
, float
...) or more complicated ([('',int),('',float),('',"|S10")])
, but in any case, all the elements have the same type. That permits some very efficient memory layout.
看,NumPy背后的想法是提供同类数组,即所有具有相同类型的元素数组。这种类型可以是简单的(int,float ...)或更复杂的([('',int),('',float),('',“| S10”)]),但无论如何,所有元素具有相同的类型。这允许一些非常有效的内存布局。
So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:
因此,固有地,结构化数组要求字段(各个子块)具有相同的大小,无论位置如何。检查以下内容:
>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])
It defines an array with three elements; each element is composed of two sub-blocks, a
and b
; a
is a block of three ints
, b
a block of five floats
. But once you define the initial size of the blocks in the dtype
, you're stuck with that (well, you can always switch, but that's another story).
它定义了一个包含三个元素的数组;每个元素由两个子块a和b组成; a是一个由三个整数组成的块,一个五个浮点块。但是一旦你在dtype中定义了块的初始大小,你就会坚持下去(好吧,你总是可以切换,但这是另一个故事)。
There's a workaround: using a dtype=object
. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:
有一个解决方法:使用dtype = object。这样,您就构建了一个异构项目数组,就像一系列不同大小的列表一样。但是你会失去很多NumPy的力量。还是一个例子:
>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]
So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:
所以,我们刚刚构造了一个......对象数组。我在某个地方放了一个列表,在其他地方放了一个字符串,它有效。您可以按照相同的示例来构建您想要的数组:
blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])
but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.
但是,你真的应该三思而后行,这是否真的是你的目的,另一个结构可能会更有效率。
A side note: please pay attention to the [(a,b,c)]
part of the expression above: notice the ()
? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c
), each sub-element being an object. If you don't put the ()
, NumPy will whine a lot.
附注:请注意上面表达式的[(a,b,c)]部分:注意()?你基本上是在告诉NumPy构造一个由3个子元素组成的1个元素的数组(每个子元素对应一个,b,c),每个子元素都是一个对象。如果你不放(),NumPy会抱怨很多。
And a last comment: if you access your fields like blob['a']
, you'll get an array of size (1,)
and dtype=object
: just use blob['a'].item()
to get back your original (6,)
int
array.
最后一条评论:如果你访问像blob ['a']这样的字段,你将得到一个大小(1,)和dtype = object的数组:只需使用blob ['a']。item()来获取你的原始(6,)int数组。