向可以复制索引的Pandas DataFrame添加行

时间:2021-12-14 07:37:54

I have a DataFrame with an index of type datetime objects. I am ultimately going to write this DataFrame to an HDF5 file using HDFStore.append. I am adding a lot of rows that need to be written to this HDF5 file. If i use HDFStore.append for every row, this takes way too long. If I collect everything in a DataFrame first, I run out of memory. So I need to chunk and write to HDF5 intermittently.

我有一个DataFrame,其索引类型为datetime对象。我最终将使用HDFStore.append将此DataFrame写入HDF5文件。我添加了很多需要写入此HDF5文件的行。如果我为每一行使用HDFStore.append,这需要太长时间。如果我首先收集DataFrame中的所有内容,我的内存不足。所以我需要间歇性地写入HDF5。

df = DataFrame([['Bob','Mary']], columns=['Boy', 'Girl'], index=[datetime.today()])

Now i would like to add another row to this WITH THE SAME INDEX

现在我想用相同的索引添加另一行

row = ['John', 'Sue']

Using .loc or .ix replaces the existing row

使用.loc或.ix替换现有行

df.loc[datetime.today()] = row

Using append works, but for my purposes is WAY TOO SLOW

使用追加作品,但为了我的目的是太慢了

new_df = DataFrame([row], columns=df.columns, index=[datetime.today()])
df.append(new_df)

Is there a better way to do this ?

有一个更好的方法吗 ?

1 个解决方案

#1


Create a list of lists and making a dataframe of that will be faster than append. Since you are already creating data frames of small chunks, why not create it in one go:

创建一个列表列表,并使其数据帧比追加更快。由于您已经在创建小块的数据帧,为什么不一次创建它:

In [1303]: pd.DataFrame([[0,1], [1,2], [2,3]], index=[pd.datetime.today()] * 3)
Out[1303]: 
                            0  1
2015-05-07 09:02:30.327473  0  1
2015-05-07 09:02:30.327473  1  2
2015-05-07 09:02:30.327473  2  3

#1


Create a list of lists and making a dataframe of that will be faster than append. Since you are already creating data frames of small chunks, why not create it in one go:

创建一个列表列表,并使其数据帧比追加更快。由于您已经在创建小块的数据帧,为什么不一次创建它:

In [1303]: pd.DataFrame([[0,1], [1,2], [2,3]], index=[pd.datetime.today()] * 3)
Out[1303]: 
                            0  1
2015-05-07 09:02:30.327473  0  1
2015-05-07 09:02:30.327473  1  2
2015-05-07 09:02:30.327473  2  3