I am interested in knowing how to convert a pandas dataframe into a numpy array, including the index, and set the dtypes.
我感兴趣的是知道如何将一个熊猫dataframe转换为一个numpy数组,包括索引,并设置dtypes。
dataframe:
dataframe:
import numpy as np
import pandas as pd
index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan]
c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan]
df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index)
df = df.rename_axis('ID')
gives
给了
label A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
convert df to array returns:
将df转换为数组返回:
array([[ nan, 0.2, nan],
[ nan, nan, 0.5],
[ nan, 0.2, 0.5],
[ 0.1, 0.2, nan],
[ 0.1, 0.2, 0.5],
[ 0.1, nan, 0.5],
[ 0.1, nan, nan]])
However, I would like:
然而,我想:
array([[ 1, nan, 0.2, nan],
[ 2, nan, nan, 0.5],
[ 3, nan, 0.2, 0.5],
[ 4, 0.1, 0.2, nan],
[ 5, 0.1, 0.2, 0.5],
[ 6, 0.1, nan, 0.5],
[ 7, 0.1, nan, nan]],
dtype=[('ID', '<i4'), ('A', '<f8'), ('B', '<f8'), ('B', '<f8')])
(or similar)
(或相似的)
Any suggestions on how to accomplish this? (I don't know if I need 1D or 2D array at this point.) I've seen a few posts that touch on this, but nothing dealing specifically with the dataframe.index.
有什么建议吗?(我不知道现在是需要1D还是2D数组。)我看到过一些与此相关的文章,但没有专门讨论dataframe.index的文章。
I am writing the dataframe disk using to_csv (and reading it back in to create array) as a workaround, but would prefer something more eloquent than my new-to-pandas kludging.
我正在使用to_csv编写dataframe磁盘(并将其重新读入创建数组)作为一个解决方案,但更希望使用比我的新到的对象更有意义的方法。
10 个解决方案
#1
147
To convert a pandas dataframe (df) to a numpy ndarray, use this code:
要将熊猫数据aframe (df)转换为numpy ndarray,请使用以下代码:
df = df.values
df = df.values
df now becomes the numpy ndarray:
df现在变成了numpy ndarray:
array([[nan, 0.2, nan],
[nan, nan, 0.5],
[nan, 0.2, 0.5],
[0.1, 0.2, nan],
[0.1, 0.2, 0.5],
[0.1, nan, 0.5],
[0.1, nan, nan]])
#2
89
Pandas has something built in...
熊猫在……
numpy_matrix = df.as_matrix()
gives
给了
array([[nan, 0.2, nan],
[nan, nan, 0.5],
[nan, 0.2, 0.5],
[0.1, 0.2, nan],
[0.1, 0.2, 0.5],
[0.1, nan, 0.5],
[0.1, nan, nan]])
#3
41
I would just chain the DataFrame.reset_index() and DataFrame.values functions to get the Numpy representation of the dataframe, including the index:
我只需将DataFrame.reset_index()和DataFrame链接起来。值函数获取数据aframe的Numpy表示,包括索引:
In [8]: df
Out[8]:
A B C
0 -0.982726 0.150726 0.691625
1 0.617297 -0.471879 0.505547
2 0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758 1.178659
4 -0.164103 0.074516 -0.674325
5 -0.340169 -0.293698 1.231791
6 -1.062825 0.556273 1.508058
7 0.959610 0.247539 0.091333
[8 rows x 3 columns]
In [9]: df.reset_index().values
Out[9]:
array([[ 0. , -0.98272574, 0.150726 , 0.69162512],
[ 1. , 0.61729734, -0.47187926, 0.50554728],
[ 2. , 0.4171228 , -1.35680324, -1.01349922],
[ 3. , -0.16636303, -0.95775849, 1.17865945],
[ 4. , -0.16410334, 0.0745164 , -0.67432474],
[ 5. , -0.34016865, -0.29369841, 1.23179064],
[ 6. , -1.06282542, 0.55627285, 1.50805754],
[ 7. , 0.95961001, 0.24753911, 0.09133339]])
To get the dtypes we'd need to transform this ndarray into a structured array using view:
为了获得dtype,我们需要使用view将这个ndarray转换为一个结构化数组:
In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574, 0.150726 , 0.69162512),
( 1, 0.61729734, -0.47187926, 0.50554728),
( 2, 0.4171228 , -1.35680324, -1.01349922),
( 3, -0.16636303, -0.95775849, 1.17865945),
( 4, -0.16410334, 0.0745164 , -0.67432474),
( 5, -0.34016865, -0.29369841, 1.23179064),
( 6, -1.06282542, 0.55627285, 1.50805754),
( 7, 0.95961001, 0.24753911, 0.09133339),
dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
#4
26
You can use the to_records
method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object
dtype in pandas):
您可以使用to_records方法,但如果dtypes不是您希望的那样,则必须对它们进行一些操作。在我的例子中,从字符串中复制了DF,索引类型是string(在熊猫中以对象dtype表示):
In [102]: df
Out[102]:
label A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Converting the recarray dtype does not work for me, but one can do this in Pandas already:
对recarray dtype进行转换对我不适用,但在熊猫中已经可以做到:
In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Note that Pandas does not set the name of the index properly (to ID
) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.
注意,熊猫没有在导出的记录数组中正确地(将索引的名称设置为ID)(错误?)
At the moment Pandas has only 8-byte integers, i8
, and floats, f8
(see this issue).
目前熊猫只有8字节的整数i8和浮点数f8(参见本期)。
#5
9
Here is my approach to making a structure array from a pandas DataFrame.
下面是我用熊猫数据存储器制作结构数组的方法。
Create the data frame
创建一个数据帧
import pandas as pd
import numpy as np
import six
NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)
A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
Define function to make a numpy structure array (not a record array) from a pandas DataFrame.
定义函数,从熊猫数据存储器中创建一个numpy结构数组(不是记录数组)。
def df_to_sarray(df):
"""
Convert a pandas DataFrame object to a numpy structured array.
This is functionally equivalent to but more efficient than
np.array(df.to_array())
:param df: the data frame to convert
:return: a numpy structured array representation of df
"""
v = df.values
cols = df.columns
if six.PY2: # python 2 needs .encode() but 3 does not
types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
else:
types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
dtype = np.dtype(types)
z = np.zeros(v.shape[0], dtype)
for (i, k) in enumerate(z.dtype.names):
z[k] = v[:, i]
return z
Use reset_index
to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.
使用reset_index创建一个新的数据框架,将索引作为数据的一部分。将数据帧转换为结构数组。
sa = df_to_sarray(df.reset_index())
sa
array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
(4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
(7L, 0.1, nan, nan)],
dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.
编辑:更新df_to_sarray以避免使用python 3调用.encode()。感谢Joseph Garvin和halcyon对他们的评论和解决方案。
#6
6
It seems like df.to_records()
will work for you. The exact feature you're looking for was requested and to_records
pointed to as an alternative.
看起来df.to_records()对您来说是可行的。您正在寻找的确切特性被请求,to_records被指向作为替代。
I tried this out locally using your example, and that call yields something very similar to the output you were looking for:
我用你的例子在本地尝试过这个方法,这个调用产生的结果与你想要的输出非常相似:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])
Note that this is a recarray
rather than an array
. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records())
.
注意,这是一个recarray而不是数组。您可以通过调用其构造函数np.array(df.to_records())将结果移动到常规的numpy数组。
#7
4
Two ways to convert the data-frame to its Numpy-array representation.
将数据帧转换为其Numpy-array表示的两种方法。
-
mah_np_array = df.as_matrix(columns=None)
mah_np_array = df.as_matrix(列=没有)
-
mah_np_array = df.values
mah_np_array = df.values
Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html
医生:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html
#8
3
Further to meteore's answer, I found the code
在梅特摩尔的回答之后,我找到了代码
df.index = df.index.astype('i8')
doesn't work for me. So I put my code here for the convenience of others stuck with this issue.
不为我工作。所以我把我的代码放在这里是为了方便其他陷入这个问题的人。
city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))
#9
2
thanks for Phil's answer, it's great.
谢谢菲尔的回答,太棒了。
reply for
回复的
doesn't work for me, error: TypeError: data type not understood – Joseph Garvin Feb 13 at 17:55
对我不起作用,错误:TypeError:数据类型不理解- Joseph Garvin 2月13日17:55
I use python 3, and get the same Error. and then I delete .encode() , then expression is as following.
我使用python 3,得到相同的错误。然后我删除。encode(),表达式如下所示。
types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
then it works.
它的工作原理。
#10
0
Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs (https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table). In short your problem has a similar solution:
当从dataframe导出到arcgis表时,遇到了类似的问题,并遇到了来自usgs的解决方案(https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+到+ arcgis + table)。简而言之,你的问题有一个相似的解决方案:
df
Out[109]:
A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])
np_data
Out[113]:
array([( nan, 0.2, nan), ( nan, nan, 0.5), ( nan, 0.2, 0.5),
( 0.1, 0.2, nan), ( 0.1, 0.2, 0.5), ( 0.1, nan, 0.5),
( 0.1, nan, nan)],
dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))
#1
147
To convert a pandas dataframe (df) to a numpy ndarray, use this code:
要将熊猫数据aframe (df)转换为numpy ndarray,请使用以下代码:
df = df.values
df = df.values
df now becomes the numpy ndarray:
df现在变成了numpy ndarray:
array([[nan, 0.2, nan],
[nan, nan, 0.5],
[nan, 0.2, 0.5],
[0.1, 0.2, nan],
[0.1, 0.2, 0.5],
[0.1, nan, 0.5],
[0.1, nan, nan]])
#2
89
Pandas has something built in...
熊猫在……
numpy_matrix = df.as_matrix()
gives
给了
array([[nan, 0.2, nan],
[nan, nan, 0.5],
[nan, 0.2, 0.5],
[0.1, 0.2, nan],
[0.1, 0.2, 0.5],
[0.1, nan, 0.5],
[0.1, nan, nan]])
#3
41
I would just chain the DataFrame.reset_index() and DataFrame.values functions to get the Numpy representation of the dataframe, including the index:
我只需将DataFrame.reset_index()和DataFrame链接起来。值函数获取数据aframe的Numpy表示,包括索引:
In [8]: df
Out[8]:
A B C
0 -0.982726 0.150726 0.691625
1 0.617297 -0.471879 0.505547
2 0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758 1.178659
4 -0.164103 0.074516 -0.674325
5 -0.340169 -0.293698 1.231791
6 -1.062825 0.556273 1.508058
7 0.959610 0.247539 0.091333
[8 rows x 3 columns]
In [9]: df.reset_index().values
Out[9]:
array([[ 0. , -0.98272574, 0.150726 , 0.69162512],
[ 1. , 0.61729734, -0.47187926, 0.50554728],
[ 2. , 0.4171228 , -1.35680324, -1.01349922],
[ 3. , -0.16636303, -0.95775849, 1.17865945],
[ 4. , -0.16410334, 0.0745164 , -0.67432474],
[ 5. , -0.34016865, -0.29369841, 1.23179064],
[ 6. , -1.06282542, 0.55627285, 1.50805754],
[ 7. , 0.95961001, 0.24753911, 0.09133339]])
To get the dtypes we'd need to transform this ndarray into a structured array using view:
为了获得dtype,我们需要使用view将这个ndarray转换为一个结构化数组:
In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574, 0.150726 , 0.69162512),
( 1, 0.61729734, -0.47187926, 0.50554728),
( 2, 0.4171228 , -1.35680324, -1.01349922),
( 3, -0.16636303, -0.95775849, 1.17865945),
( 4, -0.16410334, 0.0745164 , -0.67432474),
( 5, -0.34016865, -0.29369841, 1.23179064),
( 6, -1.06282542, 0.55627285, 1.50805754),
( 7, 0.95961001, 0.24753911, 0.09133339),
dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
#4
26
You can use the to_records
method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object
dtype in pandas):
您可以使用to_records方法,但如果dtypes不是您希望的那样,则必须对它们进行一些操作。在我的例子中,从字符串中复制了DF,索引类型是string(在熊猫中以对象dtype表示):
In [102]: df
Out[102]:
label A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Converting the recarray dtype does not work for me, but one can do this in Pandas already:
对recarray dtype进行转换对我不适用,但在熊猫中已经可以做到:
In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Note that Pandas does not set the name of the index properly (to ID
) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.
注意,熊猫没有在导出的记录数组中正确地(将索引的名称设置为ID)(错误?)
At the moment Pandas has only 8-byte integers, i8
, and floats, f8
(see this issue).
目前熊猫只有8字节的整数i8和浮点数f8(参见本期)。
#5
9
Here is my approach to making a structure array from a pandas DataFrame.
下面是我用熊猫数据存储器制作结构数组的方法。
Create the data frame
创建一个数据帧
import pandas as pd
import numpy as np
import six
NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)
A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
Define function to make a numpy structure array (not a record array) from a pandas DataFrame.
定义函数,从熊猫数据存储器中创建一个numpy结构数组(不是记录数组)。
def df_to_sarray(df):
"""
Convert a pandas DataFrame object to a numpy structured array.
This is functionally equivalent to but more efficient than
np.array(df.to_array())
:param df: the data frame to convert
:return: a numpy structured array representation of df
"""
v = df.values
cols = df.columns
if six.PY2: # python 2 needs .encode() but 3 does not
types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
else:
types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
dtype = np.dtype(types)
z = np.zeros(v.shape[0], dtype)
for (i, k) in enumerate(z.dtype.names):
z[k] = v[:, i]
return z
Use reset_index
to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.
使用reset_index创建一个新的数据框架,将索引作为数据的一部分。将数据帧转换为结构数组。
sa = df_to_sarray(df.reset_index())
sa
array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
(4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
(7L, 0.1, nan, nan)],
dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.
编辑:更新df_to_sarray以避免使用python 3调用.encode()。感谢Joseph Garvin和halcyon对他们的评论和解决方案。
#6
6
It seems like df.to_records()
will work for you. The exact feature you're looking for was requested and to_records
pointed to as an alternative.
看起来df.to_records()对您来说是可行的。您正在寻找的确切特性被请求,to_records被指向作为替代。
I tried this out locally using your example, and that call yields something very similar to the output you were looking for:
我用你的例子在本地尝试过这个方法,这个调用产生的结果与你想要的输出非常相似:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])
Note that this is a recarray
rather than an array
. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records())
.
注意,这是一个recarray而不是数组。您可以通过调用其构造函数np.array(df.to_records())将结果移动到常规的numpy数组。
#7
4
Two ways to convert the data-frame to its Numpy-array representation.
将数据帧转换为其Numpy-array表示的两种方法。
-
mah_np_array = df.as_matrix(columns=None)
mah_np_array = df.as_matrix(列=没有)
-
mah_np_array = df.values
mah_np_array = df.values
Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html
医生:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html
#8
3
Further to meteore's answer, I found the code
在梅特摩尔的回答之后,我找到了代码
df.index = df.index.astype('i8')
doesn't work for me. So I put my code here for the convenience of others stuck with this issue.
不为我工作。所以我把我的代码放在这里是为了方便其他陷入这个问题的人。
city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))
#9
2
thanks for Phil's answer, it's great.
谢谢菲尔的回答,太棒了。
reply for
回复的
doesn't work for me, error: TypeError: data type not understood – Joseph Garvin Feb 13 at 17:55
对我不起作用,错误:TypeError:数据类型不理解- Joseph Garvin 2月13日17:55
I use python 3, and get the same Error. and then I delete .encode() , then expression is as following.
我使用python 3,得到相同的错误。然后我删除。encode(),表达式如下所示。
types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
then it works.
它的工作原理。
#10
0
Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs (https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table). In short your problem has a similar solution:
当从dataframe导出到arcgis表时,遇到了类似的问题,并遇到了来自usgs的解决方案(https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+到+ arcgis + table)。简而言之,你的问题有一个相似的解决方案:
df
Out[109]:
A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])
np_data
Out[113]:
array([( nan, 0.2, nan), ( nan, nan, 0.5), ( nan, 0.2, 0.5),
( 0.1, 0.2, nan), ( 0.1, 0.2, 0.5), ( 0.1, nan, 0.5),
( 0.1, nan, nan)],
dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))