如何从内存中删除多个pandas(python)数据帧以节省RAM?

时间:2022-09-06 22:57:48

I have lot of dataframes created as part of preprocessing. Since I have limited 6GB ram, I want to delete all the unnecessary dataframes from RAM to avoid running out of memory when running GRIDSEARCHCV in scikit-learn.

我有许多数据帧作为预处理的一部分创建。由于我有6GB内存限制,我想从RAM中删除所有不必要的数据帧,以避免在scikit-learn中运行GRIDSEARCHCV时内存不足。

1) Is there a function to list only, all the dataframes currently loaded in memory?

1)是否只有列出的功能,当前加载到内存中的所有数据帧?

I tried dir() but it gives lot of other object other than dataframes.

我尝试了dir()但它提供了除dataframe之外的许多其他对象。

2) I created a list of dataframes to delete

2)我创建了一个要删除的数据帧列表

del_df=[Gender_dummies,
 capsule_trans,
 col,
 concat_df_list,
 coup_CAPSULE_dummies]

& ran

for i in del_df:
    del (i)

But its not deleting the dataframes. But deleting dataframes individially like below is deleting dataframe from memory.

但它没有删除数据帧。但是,像下面一样删除数据帧是从内存中删除数据帧。

del Gender_dummies
del col

3 个解决方案

#1


31  

del statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

当您执行del i时,您只删除名称i - 但实例仍然绑定到其他名称,因此它不会被Garbage-Collected。

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

如果要释放内存,则必须对数据帧进行Garbage-Collected,即删除对它们的所有引用。

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

如果您动态创建日期框列表,则删除该列表将触发垃圾收集。

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now

#2


11  

In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).

在python自动垃圾收集中释放变量(pandas DataFrame也只是python方面的另一个对象)。有不同的垃圾收集策略可以调整(需要重要的学习)。

You can manually trigger the garbage collection using

您可以使用手动触发垃圾回收

import gc
gc.collect()

But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.

但是不鼓励频繁调用垃圾收集,因为这是一项代价高昂的操作,可能会影响性能。

Reference

#3


2  

This will delete the dataframe and will release the RAM/memory

这将删除数据帧并释放RAM /内存

del [[df_1,df_2]]
gc.collect()
df_1=pd.DataFrame()
df_2=pd.DataFrame()

#1


31  

del statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

当您执行del i时,您只删除名称i - 但实例仍然绑定到其他名称,因此它不会被Garbage-Collected。

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

如果要释放内存,则必须对数据帧进行Garbage-Collected,即删除对它们的所有引用。

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

如果您动态创建日期框列表,则删除该列表将触发垃圾收集。

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now

#2


11  

In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).

在python自动垃圾收集中释放变量(pandas DataFrame也只是python方面的另一个对象)。有不同的垃圾收集策略可以调整(需要重要的学习)。

You can manually trigger the garbage collection using

您可以使用手动触发垃圾回收

import gc
gc.collect()

But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.

但是不鼓励频繁调用垃圾收集,因为这是一项代价高昂的操作,可能会影响性能。

Reference

#3


2  

This will delete the dataframe and will release the RAM/memory

这将删除数据帧并释放RAM /内存

del [[df_1,df_2]]
gc.collect()
df_1=pd.DataFrame()
df_2=pd.DataFrame()