删除numpy数组末尾的dtype

时间:2022-09-30 21:21:48

I'm writing a method to create an array from data file. The method looks like:

我正在编写一个从数据文件创建数组的方法。该方法如下:

import numpy
def readDataFile(fileName):
    try:
        with open(fileName, 'r') as inputs:
            data = None
            for line in inputs:
                line = line.strip()
                items = line.split('\t')
                if data == None:
                    data = numpy.array(items[0:len(items)]) 
                else:
                    data = numpy.vstack((data, items[0:len(items)]))
                return numpy.array(data)
    except IOError as ioerr:
        print 'IOError: ', ioerr
        return None

My data file contains lines of numbers, each of which is separated from each other by a tab, e.g:

我的数据文件包含数字行,每个数字都由一个标签相互分隔,例如:

1 2 3
4 5 6
7 8 9

And I expect to receive an array as follows:

我希望收到如下数组:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

However, the result contains dtype at the end of it:

但是,结果在其末尾包含dtype:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype='|S9')

Because of it, I cannot perform some operations on the result, e.g. if I try to find the max value for each line using result.max(0), I'll receive an error:

因此,我无法对结果执行某些操作,例如如果我尝试使用result.max(0)找到每一行的最大值,我将收到一个错误:

TypeError: cannot perform reduce with flexible type.

TypeError:无法使用灵活类型执行reduce。

So, can anyone tell me what's wrong with my code and how to fix it? Thanks a lot.

那么,谁能告诉我我的代码有什么问题以及如何修复它?非常感谢。

4 个解决方案

#1


8  

The easiest fix is to use numpy's loadtxt:

最简单的解决方法是使用numpy的loadtxt:

data = numpy.loadtxt(fileName, dtype='float')

Just FYI, using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt, you can replace your loop with the following to fix the dtype issue and eliminating the numpy.vstack.

仅供参考,在循环中使用numpy.vstack是个坏主意。如果您决定不使用loadtxt,则可以使用以下内容替换循环以修复dtype问题并消除numpy.vstack。

data = [row.split('\t') for row in inputs]
data = np.array(data, dtype='float')

Update

更新

Every time vstack is called it makes a new array, and copies the contents of the old arrays into the new one. This copy is roughly O(n) where n is the size of the array and if your loop runs n times the whole thing becomes O(n**2), in other words slow. If you know the final size of the array ahead of time, it's better to create the array outside the loop and fill the existing array. If you don't know the final size of the array, you can use a list inside the loop and call vstack at the end. For example:

每次调用vstack时,它都会生成一个新数组,并将旧数组的内容复制到新数组中。这个副本大致是O(n),其中n是数组的大小,如果你的循环运行n次,整个东西变成O(n ** 2),换句话说慢。如果您提前知道数组的最终大小,最好在循环外创建数组并填充现有数组。如果您不知道数组的最终大小,可以使用循环内的列表并在结尾处调用vstack。例如:

import numpy as np
myArray = np.zeros((10,3))
for i in xrange(len(myArray)):
    myArray[i] = [i, i+1, i+2]

# or:
myArray = []
for i in xrange(10):
    myArray.append(np.array([i, i+1, i+2]))
myArray = np.vstack(myArray)

#2


7  

Here is how you change data types in numpy:

以下是如何更改numpy中的数据类型:

>>> x
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> x.astype('|S9')
array([['1', '2', '3'],
       ['4', '5', '6'],
       ['7', '8', '9']], 
      dtype='|S9')
>>> x.astype('Float64')
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])
>>> x.astype('int')
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

#3


3  

... Did you try turning them into numbers first?

...你先尝试将它们变成数字吗?

items = [int(x) for x in line.split('\t')]

#4


2  

Numpy array includes a method to do this job:

Numpy数组包含一个完成这项工作的方法:

import numpy as np
a = np.array(['A', 'B'])
a
# Returns: array(['A', 'B'],  dtype='|S1')

a.tolist()
# Returns ['A', 'B']

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

#1


8  

The easiest fix is to use numpy's loadtxt:

最简单的解决方法是使用numpy的loadtxt:

data = numpy.loadtxt(fileName, dtype='float')

Just FYI, using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt, you can replace your loop with the following to fix the dtype issue and eliminating the numpy.vstack.

仅供参考,在循环中使用numpy.vstack是个坏主意。如果您决定不使用loadtxt,则可以使用以下内容替换循环以修复dtype问题并消除numpy.vstack。

data = [row.split('\t') for row in inputs]
data = np.array(data, dtype='float')

Update

更新

Every time vstack is called it makes a new array, and copies the contents of the old arrays into the new one. This copy is roughly O(n) where n is the size of the array and if your loop runs n times the whole thing becomes O(n**2), in other words slow. If you know the final size of the array ahead of time, it's better to create the array outside the loop and fill the existing array. If you don't know the final size of the array, you can use a list inside the loop and call vstack at the end. For example:

每次调用vstack时,它都会生成一个新数组,并将旧数组的内容复制到新数组中。这个副本大致是O(n),其中n是数组的大小,如果你的循环运行n次,整个东西变成O(n ** 2),换句话说慢。如果您提前知道数组的最终大小,最好在循环外创建数组并填充现有数组。如果您不知道数组的最终大小,可以使用循环内的列表并在结尾处调用vstack。例如:

import numpy as np
myArray = np.zeros((10,3))
for i in xrange(len(myArray)):
    myArray[i] = [i, i+1, i+2]

# or:
myArray = []
for i in xrange(10):
    myArray.append(np.array([i, i+1, i+2]))
myArray = np.vstack(myArray)

#2


7  

Here is how you change data types in numpy:

以下是如何更改numpy中的数据类型:

>>> x
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> x.astype('|S9')
array([['1', '2', '3'],
       ['4', '5', '6'],
       ['7', '8', '9']], 
      dtype='|S9')
>>> x.astype('Float64')
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])
>>> x.astype('int')
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

#3


3  

... Did you try turning them into numbers first?

...你先尝试将它们变成数字吗?

items = [int(x) for x in line.split('\t')]

#4


2  

Numpy array includes a method to do this job:

Numpy数组包含一个完成这项工作的方法:

import numpy as np
a = np.array(['A', 'B'])
a
# Returns: array(['A', 'B'],  dtype='|S1')

a.tolist()
# Returns ['A', 'B']

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist