I'm trying to convert a two-dimensional array into a structured array with named fields. I want each row in the 2D array to be a new record in the structured array. Unfortunately, nothing I've tried is working the way I expect.
我正在尝试将二维数组转换为带有命名字段的结构化数组。我希望2D数组中的每一行都是结构化数组中的新记录。不幸的是,我所尝试的一切都没有按照我的预期进行。
I'm starting with:
我开始:
>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
['World' '3.6' '2']]
I want to convert to something that looks like this:
我想转换成这样的东西:
>>> newarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[('Hello', 2.5, 3L) ('World', 3.6000000000000001, 2L)]
What I've tried:
我尝试过的:
>>> newarray = myarray.astype([("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]
>>> newarray = numpy.array(myarray, dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
[('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]
Both of these approaches attempt to convert each entry in myarray into a record with the given dtype, so the extra zeros are inserted. I can't figure out how to get it to convert each row into a record.
这两种方法都试图将myarray中的每个条目转换为具有给定dtype的记录,因此插入了额外的零。我无法弄清楚如何将每行转换为记录。
Another attempt:
>>> newarray = myarray.copy()
>>> newarray.dtype = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
>>> print newarray
[[('Hello', 1.7219343871178711e-317, 51L)]
[('World', 1.7543139673493688e-317, 50L)]]
This time no actual conversion is performed. The existing data in memory is just re-interpreted as the new data type.
这次没有进行实际转换。内存中的现有数据只是被重新解释为新数据类型。
The array that I'm starting with is being read in from a text file. The data types are not known ahead of time, so I can't set the dtype at the time of creation. I need a high-performance and elegant solution that will work well for general cases since I will be doing this type of conversion many, many times for a large variety of applications.
我正在从文本文件中读入我正在开始的数组。数据类型未提前知道,因此我无法在创建时设置dtype。我需要一个高性能和优雅的解决方案,适用于一般情况,因为我会为很多种应用程序进行多次,多次转换。
Thanks!
4 个解决方案
#1
28
You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:
您可以使用numpy.core.records.fromarrays“从(平面)数组列表创建记录数组”,如下所示:
>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
['World' '3.6' '2']]
>>> newrecarray = np.core.records.fromarrays(myarray.transpose(),
names='col1, col2, col3',
formats = 'S8, f8, i8')
>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]
I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.
我试图做类似的事情。我发现当numpy从现有的2D数组(使用np.core.records.fromarrays)创建一个结构化数组时,它会将二维数组中的每一列(而不是每一行)视为记录。所以你必须转置它。 numpy的这种行为似乎不太直观,但也许有充分的理由。
#2
8
I guess
new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)],
names='Col1,Col2,Col3',
formats='S8,f8,i8')
is what you want.
是你想要的。
#3
2
If the data starts as a list of tuples, then creating a structured array is straight forward:
如果数据以元组列表的形式开始,则创建结构化数组是直截了当的:
In [228]: alist = [("Hello",2.5,3),("World",3.6,2)]
In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
In [230]: np.array(alist, dtype=dt)
Out[230]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
The complication here is that the list of tuples has been turned into a 2d string array:
这里的复杂之处在于元组列表已经变成了一个二维字符串数组:
In [231]: arr = np.array(alist)
In [232]: arr
Out[232]:
array([['Hello', '2.5', '3'],
['World', '3.6', '2']],
dtype='<U5')
We could use the well known zip*
approach to 'transposing' this array - actually we want a double transpose:
我们可以使用众所周知的zip *方法'转置'这个数组 - 实际上我们想要一个双转置:
In [234]: list(zip(*arr.T))
Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]
zip
has conveniently given us a list of tuples. Now we can recreate the array with desired dtype:
zip方便地给了我们一个元组列表。现在我们可以使用所需的dtype重新创建数组:
In [235]: np.array(_, dtype=dt)
Out[235]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
The accepted answer uses fromarrays
:
接受的答案使用fromarrays:
In [236]: np.rec.fromarrays(arr.T, dtype=dt)
Out[236]:
rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
Internally, fromarrays
takes a common recfunctions
approach: create target array, and copy values by field name. Effectively it does:
在内部,fromarrays采用常见的recfunctions方法:创建目标数组,并按字段名复制值。它有效地做到了:
In [237]: newarr = np.empty(arr.shape[0], dtype=dt)
In [238]: for n, v in zip(newarr.dtype.names, arr.T):
...: newarr[n] = v
...:
In [239]: newarr
Out[239]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
#4
1
Okay, I have been struggling with this for a while now but I have found a way to do this that doesn't take too much effort. I apologise if this code is "dirty"....
好吧,我一直在努力解决这个问题,但是我找到了一种方法来做到这一点并不需要太多努力。如果这段代码“脏”,我道歉....
Let's start with a 2D array:
让我们从2D数组开始:
mydata = numpy.array([['text1', 1, 'longertext1', 0.1111],
['text2', 2, 'longertext2', 0.2222],
['text3', 3, 'longertext3', 0.3333],
['text4', 4, 'longertext4', 0.4444],
['text5', 5, 'longertext5', 0.5555]])
So we end up with a 2D array with 4 columns and 5 rows:
所以我们最终得到一个包含4列和5行的2D数组:
mydata.shape
Out[30]: (5L, 4L)
To use numpy.core.records.arrays - we need to supply the input argument as a list of arrays so:
要使用numpy.core.records.arrays - 我们需要提供输入参数作为数组列表,以便:
tuple(mydata)
Out[31]:
(array(['text1', '1', 'longertext1', '0.1111'],
dtype='|S11'),
array(['text2', '2', 'longertext2', '0.2222'],
dtype='|S11'),
array(['text3', '3', 'longertext3', '0.3333'],
dtype='|S11'),
array(['text4', '4', 'longertext4', '0.4444'],
dtype='|S11'),
array(['text5', '5', 'longertext5', '0.5555'],
dtype='|S11'))
This produces a separate array per row of data BUT, we need the input arrays to be by column so what we will need is:
这会为每行数据生成一个单独的数组BUT,我们需要输入数组按列,所以我们需要的是:
tuple(mydata.transpose())
Out[32]:
(array(['text1', 'text2', 'text3', 'text4', 'text5'],
dtype='|S11'),
array(['1', '2', '3', '4', '5'],
dtype='|S11'),
array(['longertext1', 'longertext2', 'longertext3', 'longertext4',
'longertext5'],
dtype='|S11'),
array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'],
dtype='|S11'))
Finally it needs to be a list of arrays, not a tuple, so we wrap the above in list() as below:
最后它需要是一个数组列表,而不是一个元组,所以我们将上面的内容包装在list()中,如下所示:
list(tuple(mydata.transpose()))
That is our data input argument sorted.... next is the dtype:
这是我们的数据输入参数排序....接下来是dtype:
mydtype = numpy.dtype([('My short text Column', 'S5'),
('My integer Column', numpy.int16),
('My long text Column', 'S11'),
('My float Column', numpy.float32)])
mydtype
Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
Okay, so now we can pass that to the numpy.core.records.array():
好的,现在我们可以将它传递给numpy.core.records.array():
myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)
... and fingers crossed:
......和手指交叉:
myRecord
Out[36]:
rec.array([('text1', 1, 'longertext1', 0.11110000312328339),
('text2', 2, 'longertext2', 0.22220000624656677),
('text3', 3, 'longertext3', 0.33329999446868896),
('text4', 4, 'longertext4', 0.44440001249313354),
('text5', 5, 'longertext5', 0.5554999709129333)],
dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
Voila! You can index by column name as in:
瞧!您可以按列名索引,如下所示:
myRecord['My float Column']
Out[39]: array([ 0.1111 , 0.22220001, 0.33329999, 0.44440001, 0.55549997], dtype=float32)
I hope this helps as I wasted so much time with numpy.asarray and mydata.astype etc trying to get this to work before finally working out this method.
我希望这会有所帮助,因为我浪费了很多时间与numpy.asarray和mydata.astype等尝试让这个工作在最终解决这个方法之前。
#1
28
You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:
您可以使用numpy.core.records.fromarrays“从(平面)数组列表创建记录数组”,如下所示:
>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
['World' '3.6' '2']]
>>> newrecarray = np.core.records.fromarrays(myarray.transpose(),
names='col1, col2, col3',
formats = 'S8, f8, i8')
>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]
I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.
我试图做类似的事情。我发现当numpy从现有的2D数组(使用np.core.records.fromarrays)创建一个结构化数组时,它会将二维数组中的每一列(而不是每一行)视为记录。所以你必须转置它。 numpy的这种行为似乎不太直观,但也许有充分的理由。
#2
8
I guess
new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)],
names='Col1,Col2,Col3',
formats='S8,f8,i8')
is what you want.
是你想要的。
#3
2
If the data starts as a list of tuples, then creating a structured array is straight forward:
如果数据以元组列表的形式开始,则创建结构化数组是直截了当的:
In [228]: alist = [("Hello",2.5,3),("World",3.6,2)]
In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
In [230]: np.array(alist, dtype=dt)
Out[230]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
The complication here is that the list of tuples has been turned into a 2d string array:
这里的复杂之处在于元组列表已经变成了一个二维字符串数组:
In [231]: arr = np.array(alist)
In [232]: arr
Out[232]:
array([['Hello', '2.5', '3'],
['World', '3.6', '2']],
dtype='<U5')
We could use the well known zip*
approach to 'transposing' this array - actually we want a double transpose:
我们可以使用众所周知的zip *方法'转置'这个数组 - 实际上我们想要一个双转置:
In [234]: list(zip(*arr.T))
Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]
zip
has conveniently given us a list of tuples. Now we can recreate the array with desired dtype:
zip方便地给了我们一个元组列表。现在我们可以使用所需的dtype重新创建数组:
In [235]: np.array(_, dtype=dt)
Out[235]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
The accepted answer uses fromarrays
:
接受的答案使用fromarrays:
In [236]: np.rec.fromarrays(arr.T, dtype=dt)
Out[236]:
rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
Internally, fromarrays
takes a common recfunctions
approach: create target array, and copy values by field name. Effectively it does:
在内部,fromarrays采用常见的recfunctions方法:创建目标数组,并按字段名复制值。它有效地做到了:
In [237]: newarr = np.empty(arr.shape[0], dtype=dt)
In [238]: for n, v in zip(newarr.dtype.names, arr.T):
...: newarr[n] = v
...:
In [239]: newarr
Out[239]:
array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
#4
1
Okay, I have been struggling with this for a while now but I have found a way to do this that doesn't take too much effort. I apologise if this code is "dirty"....
好吧,我一直在努力解决这个问题,但是我找到了一种方法来做到这一点并不需要太多努力。如果这段代码“脏”,我道歉....
Let's start with a 2D array:
让我们从2D数组开始:
mydata = numpy.array([['text1', 1, 'longertext1', 0.1111],
['text2', 2, 'longertext2', 0.2222],
['text3', 3, 'longertext3', 0.3333],
['text4', 4, 'longertext4', 0.4444],
['text5', 5, 'longertext5', 0.5555]])
So we end up with a 2D array with 4 columns and 5 rows:
所以我们最终得到一个包含4列和5行的2D数组:
mydata.shape
Out[30]: (5L, 4L)
To use numpy.core.records.arrays - we need to supply the input argument as a list of arrays so:
要使用numpy.core.records.arrays - 我们需要提供输入参数作为数组列表,以便:
tuple(mydata)
Out[31]:
(array(['text1', '1', 'longertext1', '0.1111'],
dtype='|S11'),
array(['text2', '2', 'longertext2', '0.2222'],
dtype='|S11'),
array(['text3', '3', 'longertext3', '0.3333'],
dtype='|S11'),
array(['text4', '4', 'longertext4', '0.4444'],
dtype='|S11'),
array(['text5', '5', 'longertext5', '0.5555'],
dtype='|S11'))
This produces a separate array per row of data BUT, we need the input arrays to be by column so what we will need is:
这会为每行数据生成一个单独的数组BUT,我们需要输入数组按列,所以我们需要的是:
tuple(mydata.transpose())
Out[32]:
(array(['text1', 'text2', 'text3', 'text4', 'text5'],
dtype='|S11'),
array(['1', '2', '3', '4', '5'],
dtype='|S11'),
array(['longertext1', 'longertext2', 'longertext3', 'longertext4',
'longertext5'],
dtype='|S11'),
array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'],
dtype='|S11'))
Finally it needs to be a list of arrays, not a tuple, so we wrap the above in list() as below:
最后它需要是一个数组列表,而不是一个元组,所以我们将上面的内容包装在list()中,如下所示:
list(tuple(mydata.transpose()))
That is our data input argument sorted.... next is the dtype:
这是我们的数据输入参数排序....接下来是dtype:
mydtype = numpy.dtype([('My short text Column', 'S5'),
('My integer Column', numpy.int16),
('My long text Column', 'S11'),
('My float Column', numpy.float32)])
mydtype
Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
Okay, so now we can pass that to the numpy.core.records.array():
好的,现在我们可以将它传递给numpy.core.records.array():
myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)
... and fingers crossed:
......和手指交叉:
myRecord
Out[36]:
rec.array([('text1', 1, 'longertext1', 0.11110000312328339),
('text2', 2, 'longertext2', 0.22220000624656677),
('text3', 3, 'longertext3', 0.33329999446868896),
('text4', 4, 'longertext4', 0.44440001249313354),
('text5', 5, 'longertext5', 0.5554999709129333)],
dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
Voila! You can index by column name as in:
瞧!您可以按列名索引,如下所示:
myRecord['My float Column']
Out[39]: array([ 0.1111 , 0.22220001, 0.33329999, 0.44440001, 0.55549997], dtype=float32)
I hope this helps as I wasted so much time with numpy.asarray and mydata.astype etc trying to get this to work before finally working out this method.
我希望这会有所帮助,因为我浪费了很多时间与numpy.asarray和mydata.astype等尝试让这个工作在最终解决这个方法之前。