如何向numpy数组中添加列

时间:2021-01-26 13:42:26

I am trying to add one column to the array created from recfromcsv. In this case it's an array: [210,8] (rows, cols).

我正在尝试向从recfromcsv中创建的数组添加一列。在本例中,它是一个数组:[210,8](行,cols)。

I want to add a ninth column. Empty or with zeroes doesn't matter.

我想加一下第九列。空或者0都不重要。

from numpy import genfromtxt
from numpy import recfromcsv
import numpy as np
import time

if __name__ == '__main__':
 print("testing")
 my_data = recfromcsv('LIAB.ST.csv', delimiter='\t')
 array_size = my_data.size
 #my_data = np.append(my_data[:array_size],my_data[9:],0)

 new_col = np.sum(x,1).reshape((x.shape[0],1))
 np.append(x,new_col,1)

3 个解决方案

#1


52  

I think that your problem is that you are expecting np.append to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays

我认为你的问题是你在期待np。附加在本地添加列,但是由于numpy数据的存储方式,它所做的是创建一个已连接数组的副本

Returns
-------
append : ndarray
    A copy of `arr` with `values` appended to `axis`.  Note that `append`
    does not occur in-place: a new array is allocated and filled.  If
    `axis` is None, `out` is a flattened array.

so you need to save the output all_data = np.append(...):

因此需要保存输出all_data = np.append(…):

my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t')
new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape
new_col.shape
#(210,1)
all_data = np.append(my_data, new_col, 1)
all_data.shape
#(210,9)

Alternative ways:

替代方法:

all_data = np.hstack((my_data, new_col))
#or
all_data = np.concatenate((my_data, new_col), 1)

I believe that the only difference between these three functions (as well as np.vstack) are their default behaviors for when axis is unspecified:

我认为这三个函数(以及np.vstack)之间的唯一区别是它们在轴未指定时的默认行为:

  • concatenate assumes axis = 0
  • 连接假设轴为0
  • hstack assumes axis = 1 unless inputs are 1d, then axis = 0
  • hstack假设轴= 1,除非输入是1d,否则轴= 0
  • vstack assumes axis = 0 after adding an axis if inputs are 1d
  • 如果输入是1d,则vstack在添加一个轴后假设轴为0
  • append flattens array
  • 附加趋于平缓数组

Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt which returns a structured array and recfromcsv which returns the subtly different record array (recarray). You used the recfromcsv so right now my_data is actually a recarray, which means that most likely my_data.shape = (210,) since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.

根据您的评论,并更仔细地查看示例代码,我相信您现在可能要做的是向记录数组添加一个字段。您导入了genfromtxt,它返回一个结构化数组和recfromcsv,它返回微妙的不同的记录数组(recarray)。您使用了recfromcsv,所以现在my_data实际上是一个recarray,这意味着最有可能是my_data。因为recarray是记录的一维数组,其中每个记录是具有给定dtype的一个元组。

So you could try this:

你可以试试这个

import numpy as np
from numpy.lib.recfunctions import append_fields
x = np.random.random(10)
y = np.random.random(10)
z = np.random.random(10)
data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)])
data = np.recarray(data.shape, data.dtype, buf=data)
data.shape
#(10,)
tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray
tot.shape
#(10,)
all_data = append_fields(data, 'total', tot, usemask=False)
all_data
#array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498),
#       (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745),
#       (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588  , 2.121903762680979 ),
#       (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306),
#       (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675  , 1.4957409515009568),
#       (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308  , 2.4853911958174133),
#       (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103  , 1.275756904913104 ),
#       (0.684075052174589  , 0.6654774682866273 , 0.5246593820025259  , 1.8742119024637423),
#       (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ),
#       (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)], 
#      dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')])
all_data.shape
#(10,)
all_data.dtype.names
#('x', 'y', 'z', 'total')

#2


8  

If you have an array, a of say 210 rows by 8 columns:

如果你有一个数组,a有210行8列:

a = numpy.empty([210,8])

and want to add a ninth column of zeros you can do this:

想要添加九列0你可以这样做:

b = numpy.append(a,numpy.zeros([len(a),1]),1)

#3


1  

I add a new column with ones to a matrix array in this way:

我将一个带有1的新列添加到一个矩阵数组中,如下所示:

Z = append([[1 for _ in range(0,len(Z))]], Z.T,0).T

Maybe it is not that efficient?

也许没有那么有效?

#1


52  

I think that your problem is that you are expecting np.append to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays

我认为你的问题是你在期待np。附加在本地添加列,但是由于numpy数据的存储方式,它所做的是创建一个已连接数组的副本

Returns
-------
append : ndarray
    A copy of `arr` with `values` appended to `axis`.  Note that `append`
    does not occur in-place: a new array is allocated and filled.  If
    `axis` is None, `out` is a flattened array.

so you need to save the output all_data = np.append(...):

因此需要保存输出all_data = np.append(…):

my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t')
new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape
new_col.shape
#(210,1)
all_data = np.append(my_data, new_col, 1)
all_data.shape
#(210,9)

Alternative ways:

替代方法:

all_data = np.hstack((my_data, new_col))
#or
all_data = np.concatenate((my_data, new_col), 1)

I believe that the only difference between these three functions (as well as np.vstack) are their default behaviors for when axis is unspecified:

我认为这三个函数(以及np.vstack)之间的唯一区别是它们在轴未指定时的默认行为:

  • concatenate assumes axis = 0
  • 连接假设轴为0
  • hstack assumes axis = 1 unless inputs are 1d, then axis = 0
  • hstack假设轴= 1,除非输入是1d,否则轴= 0
  • vstack assumes axis = 0 after adding an axis if inputs are 1d
  • 如果输入是1d,则vstack在添加一个轴后假设轴为0
  • append flattens array
  • 附加趋于平缓数组

Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt which returns a structured array and recfromcsv which returns the subtly different record array (recarray). You used the recfromcsv so right now my_data is actually a recarray, which means that most likely my_data.shape = (210,) since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.

根据您的评论,并更仔细地查看示例代码,我相信您现在可能要做的是向记录数组添加一个字段。您导入了genfromtxt,它返回一个结构化数组和recfromcsv,它返回微妙的不同的记录数组(recarray)。您使用了recfromcsv,所以现在my_data实际上是一个recarray,这意味着最有可能是my_data。因为recarray是记录的一维数组,其中每个记录是具有给定dtype的一个元组。

So you could try this:

你可以试试这个

import numpy as np
from numpy.lib.recfunctions import append_fields
x = np.random.random(10)
y = np.random.random(10)
z = np.random.random(10)
data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)])
data = np.recarray(data.shape, data.dtype, buf=data)
data.shape
#(10,)
tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray
tot.shape
#(10,)
all_data = append_fields(data, 'total', tot, usemask=False)
all_data
#array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498),
#       (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745),
#       (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588  , 2.121903762680979 ),
#       (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306),
#       (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675  , 1.4957409515009568),
#       (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308  , 2.4853911958174133),
#       (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103  , 1.275756904913104 ),
#       (0.684075052174589  , 0.6654774682866273 , 0.5246593820025259  , 1.8742119024637423),
#       (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ),
#       (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)], 
#      dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')])
all_data.shape
#(10,)
all_data.dtype.names
#('x', 'y', 'z', 'total')

#2


8  

If you have an array, a of say 210 rows by 8 columns:

如果你有一个数组,a有210行8列:

a = numpy.empty([210,8])

and want to add a ninth column of zeros you can do this:

想要添加九列0你可以这样做:

b = numpy.append(a,numpy.zeros([len(a),1]),1)

#3


1  

I add a new column with ones to a matrix array in this way:

我将一个带有1的新列添加到一个矩阵数组中,如下所示:

Z = append([[1 for _ in range(0,len(Z))]], Z.T,0).T

Maybe it is not that efficient?

也许没有那么有效?