从Numpy数组创建一个熊猫数据aframe:如何指定索引列和列标题?

时间:2023-01-28 12:32:19

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:

我有一个由列表组成的Numpy数组,它表示一个二维数组,其中包含行标签和列名,如下所示:

data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])

I'd like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values

我希望得到的DataFrame具有Row1和Row2作为索引值,Col1、Col2作为标题值

I can specify the index as follows:

我可以指定如下索引:

df = pd.DataFrame(data,index=data[:,0]),

however I am unsure how to best assign column headers.

但是我不确定如何最好地分配列标题。

2 个解决方案

#1


144  

You need to specify data, index and columns to DataFrame constructor, as in:

需要向DataFrame构造函数指定数据、索引和列,如:

>>> pd.DataFrame(data=data[1:,1:],    # values
...              index=data[1:,0],    # 1st column as index
...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

编辑:如@joris注释中所示,您可能需要将上面的内容更改为np.int_(data[1:,1:]),以获得正确的数据类型。

#2


13  

I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying "option 2" from this great answer, you could do it like this:

我同意尤里斯;看起来您应该采用不同的方法,比如使用numpy记录数组。从这个伟大的答案中修改“选项2”,你可以这样做:

import pandas
import numpy

dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pandas.DataFrame(values, index=index)

#1


144  

You need to specify data, index and columns to DataFrame constructor, as in:

需要向DataFrame构造函数指定数据、索引和列,如:

>>> pd.DataFrame(data=data[1:,1:],    # values
...              index=data[1:,0],    # 1st column as index
...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

编辑:如@joris注释中所示,您可能需要将上面的内容更改为np.int_(data[1:,1:]),以获得正确的数据类型。

#2


13  

I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying "option 2" from this great answer, you could do it like this:

我同意尤里斯;看起来您应该采用不同的方法,比如使用numpy记录数组。从这个伟大的答案中修改“选项2”,你可以这样做:

import pandas
import numpy

dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pandas.DataFrame(values, index=index)