如何将带有对象dtype的Numpy 2D数组转换为普通的二维浮点数组

时间:2021-01-18 21:41:21

As part of broader program I am working on, I ended up with object arrays with strings, 3D coordinates and etc all mixed. I know object arrays might not be very favorite in comparison to structured arrays but I am hoping to get around this without changing a lot of codes.

作为我正在开发的更广泛的程序的一部分,我最终得到了具有字符串、3D坐标等混合的对象数组。我知道,与结构化数组相比,对象数组可能不太受欢迎,但我希望在不修改大量代码的情况下解决这个问题。

Lets assume every row of my array obj_array (with N rows) has format of

让我们假设数组obj_array(有N行)的每一行都有格式。

Single entry/object of obj_array:  ['NAME',[10.0,20.0,30.0],....] 

Now, I am trying to load this object array and slice the 3D coordinate chunk. Up to here, everything works fine with simply asking lets say for .

现在,我尝试加载这个对象数组并切片3D坐标块。到目前为止,只要简单地询问let for,一切都没问题。

obj_array[:,[1,2,3]]

However the result is also an object array and I will face problem as I want to form a 2D array of floats with:

但是,结果也是一个对象数组,我将面临如下问题:

size [N,3] of N rows and 3 entries of X,Y,Z coordinates

For now, I am looping over rows and assigning every row to a row of a destination 2D flot array to get around the problem. I am wondering if there is any better way with array conversion tools of numpy ? I tried a few things and could not get around it.

目前,我正在对行进行循环,并将每一行分配给目标2D flot数组中的一行,以解决这个问题。我想知道是否有更好的方法使用数组转换工具的numpy ?我试了几样东西,但没能避开。

Centers   = np.zeros([N,3])

for row in range(obj_array.shape[0]):
    Centers[row,:] = obj_array[row,1]

Thanks

谢谢

6 个解决方案

#1


8  

Nasty little problem... I have been fooling around with this toy example:

讨厌的小问题…我一直在摆弄这个玩具的例子:

>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
       ['two', [4, 5, 6]]], dtype=object)

My first guess was:

我的第一个猜测是:

>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)

But that keeps the object dtype, so perhaps then:

但这保留了对象dtype,所以可能:

>>> np.array(arr[:, 1], dtype=np.float)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

You can normally work around this doing the following:

你通常可以围绕着它做以下的事情:

>>> np.array(arr[:, 1], dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object

Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:

但这里没有,这有点让人费解。显然,数组中的对象都是列表,这就导致了这个问题,因为用元组替换列表是可行的:

>>> np.array([tuple(j) for j in arr[:, 1]],
...          dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:

由于似乎没有任何完全令人满意的解决办法,最简单的办法可能是:

>>> np.array(list(arr[:, 1]), dtype=np.float)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Although that will not be very efficient, probably better to go with something like:

虽然这不是很有效,但可能更好的做法是:

>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', np.float)]*3,
...             count=len(arr)).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

#2


3  

Based on Jaime's toy example I think you can do this very simply using np.vstack():

基于Jaime的玩具示例,我认为您可以非常简单地使用np.vstack():

arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)

This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.

不管对象数组中的“数字”元素是1D numpy数组、列表还是元组,这都可以工作。

#3


1  

You may want to use structured array, so that when you need to access the names and the values independently you can easily do so. In this example, there are two data points:

您可能希望使用结构化数组,以便当您需要独立地访问名称和值时,您可以很容易地这样做。在本例中,有两个数据点:

x = zeros(2, dtype=[('name','S10'), ('value','f4',(3,))])
x[0][0]='item1'
x[1][0]='item2'
y1=x['name']
y2=x['value']

the result:

结果:

>>> y1
array(['item1', 'item2'], 
      dtype='|S10')
>>> y2
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]], dtype=float32)

See more details: http://docs.scipy.org/doc/numpy/user/basics.rec.html

看到更多的细节:http://docs.scipy.org/doc/numpy/user/basics.rec.html

#4


1  

This works great working on your array arr to convert from an object to an array of floats. Number processing is extremely easy after. Thanks for that last post!!!! I just modified it to include any DataFrame size:

这对数组arr非常有效,可以将对象转换为浮点数组。数字处理非常简单。谢谢你的最后一篇文章!!!我只是修改了它,以包含任何DataFrame大小:

float_arr = np.vstack(arr[:, :]).astype(np.float)

#5


0  

This is way faster to just convert your object array to a NumPy float array: arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float) - from there no looping, index it just like you'd normally do on a NumPy array. You'd have to do it in chunks though with your different datatypes arr[:, 1], arr[:,2], etc. Had the same issue with a NumPy tuple object returned from a C++ DLL function - conversion for 17M elements takes <2s.

将对象数组转换为NumPy浮点数组要快得多:arr=np。数组(arr, dtype=[('O', np.float)]))).astype(np.float) -从那里没有循环,索引它,就像在NumPy数组中那样。你必须用不同的数据类型arr[:, 1], arr[:,2]等来做数据块。从c++ DLL函数返回的NumPy tuple对象也有同样的问题——17M元素的转换需要<2s。

#6


0  

This problem usually happens when you have a dataset with different types, usually, dates in the first column or so.

当您有一个具有不同类型的数据集时,这个问题通常会发生在第一列左右。

What I use to do, is to store the date column in a different variable; and take the rest of the "X matrix of features" into X. So I have dates and X, for instance.

我通常做的是,将日期列存储在一个不同的变量中;把剩下的"X特征矩阵"放到X中,比如我有日期和X。

Then I apply the conversion to the X matrix as:

然后对X矩阵的变换为:

X = np.array(list(X[:,:]), dtype=np.float)

X = np.array(列表(X[:,:]),dtype = np.float)

Hope to help!

希望帮助!

#1


8  

Nasty little problem... I have been fooling around with this toy example:

讨厌的小问题…我一直在摆弄这个玩具的例子:

>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
       ['two', [4, 5, 6]]], dtype=object)

My first guess was:

我的第一个猜测是:

>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)

But that keeps the object dtype, so perhaps then:

但这保留了对象dtype,所以可能:

>>> np.array(arr[:, 1], dtype=np.float)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

You can normally work around this doing the following:

你通常可以围绕着它做以下的事情:

>>> np.array(arr[:, 1], dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object

Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:

但这里没有,这有点让人费解。显然,数组中的对象都是列表,这就导致了这个问题,因为用元组替换列表是可行的:

>>> np.array([tuple(j) for j in arr[:, 1]],
...          dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:

由于似乎没有任何完全令人满意的解决办法,最简单的办法可能是:

>>> np.array(list(arr[:, 1]), dtype=np.float)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Although that will not be very efficient, probably better to go with something like:

虽然这不是很有效,但可能更好的做法是:

>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', np.float)]*3,
...             count=len(arr)).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

#2


3  

Based on Jaime's toy example I think you can do this very simply using np.vstack():

基于Jaime的玩具示例,我认为您可以非常简单地使用np.vstack():

arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)

This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.

不管对象数组中的“数字”元素是1D numpy数组、列表还是元组,这都可以工作。

#3


1  

You may want to use structured array, so that when you need to access the names and the values independently you can easily do so. In this example, there are two data points:

您可能希望使用结构化数组,以便当您需要独立地访问名称和值时,您可以很容易地这样做。在本例中,有两个数据点:

x = zeros(2, dtype=[('name','S10'), ('value','f4',(3,))])
x[0][0]='item1'
x[1][0]='item2'
y1=x['name']
y2=x['value']

the result:

结果:

>>> y1
array(['item1', 'item2'], 
      dtype='|S10')
>>> y2
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]], dtype=float32)

See more details: http://docs.scipy.org/doc/numpy/user/basics.rec.html

看到更多的细节:http://docs.scipy.org/doc/numpy/user/basics.rec.html

#4


1  

This works great working on your array arr to convert from an object to an array of floats. Number processing is extremely easy after. Thanks for that last post!!!! I just modified it to include any DataFrame size:

这对数组arr非常有效,可以将对象转换为浮点数组。数字处理非常简单。谢谢你的最后一篇文章!!!我只是修改了它,以包含任何DataFrame大小:

float_arr = np.vstack(arr[:, :]).astype(np.float)

#5


0  

This is way faster to just convert your object array to a NumPy float array: arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float) - from there no looping, index it just like you'd normally do on a NumPy array. You'd have to do it in chunks though with your different datatypes arr[:, 1], arr[:,2], etc. Had the same issue with a NumPy tuple object returned from a C++ DLL function - conversion for 17M elements takes <2s.

将对象数组转换为NumPy浮点数组要快得多:arr=np。数组(arr, dtype=[('O', np.float)]))).astype(np.float) -从那里没有循环,索引它,就像在NumPy数组中那样。你必须用不同的数据类型arr[:, 1], arr[:,2]等来做数据块。从c++ DLL函数返回的NumPy tuple对象也有同样的问题——17M元素的转换需要<2s。

#6


0  

This problem usually happens when you have a dataset with different types, usually, dates in the first column or so.

当您有一个具有不同类型的数据集时,这个问题通常会发生在第一列左右。

What I use to do, is to store the date column in a different variable; and take the rest of the "X matrix of features" into X. So I have dates and X, for instance.

我通常做的是,将日期列存储在一个不同的变量中;把剩下的"X特征矩阵"放到X中,比如我有日期和X。

Then I apply the conversion to the X matrix as:

然后对X矩阵的变换为:

X = np.array(list(X[:,:]), dtype=np.float)

X = np.array(列表(X[:,:]),dtype = np.float)

Hope to help!

希望帮助!