I want to save the dataframe df to the .h5 file MainDataFile.h5 :
我想将数据帧df保存到.h5文件MainDataFile.h5:
df.to_hdf ("c:/Temp/MainDataFile.h5", "MainData", mode = "w", format = "table", data_columns=['_FirstDayOfPeriod','Category','ChannelId'])
and get the following error :
并得到以下错误:
*** Exception: cannot find the correct atom type -> > [dtype->object,items->Index(['Libellé_Article', 'Libellé_segment'], dtype='object')]
***异常:找不到正确的原子类型 - >> [dtype-> object,items-> Index(['Libellé_Article','Libellé_segment'],dtype ='object')]
If I modifify the column 'Libellé_Article' in this way :
如果我以这种方式修改“Libellé_Article”列:
df['Libellé_Article'] = str(df['Libellé_Article'])
there is no error anymore, whereas I still get the error message when doing :
没有错误,而我在执行时仍然收到错误消息:
df['Libellé_Article'] = df['Libellé_Article'].astype(str)
The problem is that using str() is blowing up my ram.
问题是使用str()会炸毁我的ram。
Any idea ?
任何的想法 ?
1 个解决方案
#1
str(df['Libellé_Article'])
will convert the contents of the entire column in to single string. It will end up with a very big string. And thats the reason for blowing up your RAM
str(df ['Libellé_Article'])会将整个列的内容转换为单个字符串。它会以一个非常大的字符串结束。这就是炸毁RAM的原因
For example
>> df = pd.DataFrame([1,2,3], columns=['A'])
>> df['A']
0 1
1 2
2 3
Name: A, dtype: int64
>> str(df['A'])
'0 1\n1 2\n2 3\nName: A, dtype: int64'
>> df['A'].astype(str)
0 1
1 2
2 3
Name: A, dtype: object
So you should use .astype(str)
only, if you want to convert your entire column to type string
因此,如果要将整个列转换为字符串类型,则应仅使用.astype(str)
#1
str(df['Libellé_Article'])
will convert the contents of the entire column in to single string. It will end up with a very big string. And thats the reason for blowing up your RAM
str(df ['Libellé_Article'])会将整个列的内容转换为单个字符串。它会以一个非常大的字符串结束。这就是炸毁RAM的原因
For example
>> df = pd.DataFrame([1,2,3], columns=['A'])
>> df['A']
0 1
1 2
2 3
Name: A, dtype: int64
>> str(df['A'])
'0 1\n1 2\n2 3\nName: A, dtype: int64'
>> df['A'].astype(str)
0 1
1 2
2 3
Name: A, dtype: object
So you should use .astype(str)
only, if you want to convert your entire column to type string
因此,如果要将整个列转换为字符串类型,则应仅使用.astype(str)