I run simulations on a Windows 64bit-computer with 64 GB RAM. Memory use reaches 55% and after a finished simulation run I remove all objects in the working space by rm(list=ls())
, followed by a double gc()
.
我在装有64gb内存的Windows 64bit电脑上运行模拟。内存使用达到55%,在完成模拟运行后,我使用rm(list=ls())删除工作空间中的所有对象,然后使用double gc()。
I supposed that this would free enough memory for the next simulation run, but actually memory usage drops by just 1%. Consulting a lot of different fora I could not find a satisfactory explanation, only vague comments such as:
我认为这将为下一次模拟运行释放足够的内存,但实际上内存使用量仅下降了1%。查阅了很多不同的论坛,我找不到满意的解释,只有模糊的评论,如:
"Depending on your operating system, the freed up memory might not be returned to the operating system, but kept in the process space."
“根据您的操作系统,释放的内存可能不会返回到操作系统,而是保存在进程空间中。”
I'd like to find information on:
我想了解一下:
- 1) which OS and under which conditions freed memory is not returned to the OS, and
- 1)哪些操作系统和哪些条件下释放的内存没有返回到操作系统,以及。
- 2) if there is any other remedy than closing R and start it again for the next simulation run?
- 2)如果有任何其他补救措施,而不是关闭R,并在下一次模拟运行时重新启动它?
2 个解决方案
#1
19
How do you check memory usage? Normally virtual machine allocates some chunk of memory that it uses to store its data. Some of the allocated may be unused and marked as free. What GC does is discovering data that is not referenced from anywhere else and marking corresponding chunks of memory as unused, this does not mean that this memory is released to the OS. Still from the VM perspective there's now more free memory that can be used for further computation.
如何检查内存使用情况?通常,虚拟机会分配一些用于存储数据的内存块。一些分配的可能是未使用和标记为免费的。GC所做的是发现其他地方没有引用的数据,并将相应的内存块标记为未使用,这并不意味着将该内存释放给操作系统。从VM的角度来看,现在有更多的空闲内存可以用于进一步的计算。
As others asked did you experience out of memory errors? If not then there's nothing to worry about.
有人问,你是否有过内存错误?如果没有,那就没什么好担心的了。
EDIT: This and this should be enough to understand how memory allocation and garbage collection works in R.
编辑:这个和这个应该足够理解内存分配和垃圾收集在R中是如何工作的。
From the first document:
从第一个文档:
Occasionally an attempt is made to release unused pages back to the operating system. When pages are released, a number of free nodes equal to R_MaxKeepFrac times the number of allocated nodes for each class is retained. Pages not needed to meet this requirement are released. An attempt to release pages is made every R_PageReleaseFreq level 1 or level 2 collections.
偶尔会尝试将未使用的页面释放回操作系统。在释放页面时,将保留若干个空闲节点,等于R_MaxKeepFrac乘以每个类分配的节点数量。不需要满足此要求的页面将被释放。发布页面的尝试是每个R_PageReleaseFreq级别1或2级集合。
EDIT2:
EDIT2:
To see used memory try running gc() with verbose set to TRUE:
要查看已使用的内存,请尝试运行gc(),将详细设置为TRUE:
gc(verbose=T)
Here's a result with an array of 10'000'000 integers in memory:
这里有一个内存中有10000个整数的数组:
Garbage collection 9 = 1+0+8 (level 2) ...
10.7 Mbytes of cons cells used (49%)
40.6 Mbytes of vectors used (72%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198838 10.7 407500 21.8 350000 18.7
Vcells 5311050 40.6 7421749 56.7 5311504 40.6
And here's after discarding reference to it:
这是在放弃对它的引用之后
Garbage collection 10 = 1+0+9 (level 2) ...
10.7 Mbytes of cons cells used (49%)
2.4 Mbytes of vectors used (5%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198821 10.7 407500 21.8 350000 18.7
Vcells 310987 2.4 5937399 45.3 5311504 40.6
As you can see memory used by Vcells fell from 40.6Mb to 2.4Mb.
可以看到Vcells使用的内存从40.6Mb下降到2.4Mb。
#2
25
The R
garbage collector is imperfect in the following (not so) subtle way: it does not move objects (i.e., it does not compact memory) because of the way it interacts with C
libraries. (Some other languages/implementations suffer from this too, but others, despite also having to interact with C
, manage to have a compacting generational GC which does not suffer from this problem).
R垃圾收集器在以下(并非如此)微妙的方式中是不完美的:它不会移动对象(例如。因为它与C库交互的方式)。(其他一些语言/实现也受到这个问题的影响,但是其他语言/实现尽管也必须与C进行交互,但仍然拥有一个紧凑的分代GC,不会受到这个问题的影响)。
This means that if you take turns allocating small chunks of memory which are then discarded and larger chunks for more permanent objects (this is a common situation when doing string/regexp processing), then your memory becomes fragmented and the garbage collector can do nothing about it: the memory is released, but cannot be re-used because the free chunks are too short.
这意味着如果你轮流分配小块内存,然后丢弃和较大的块更持久对象(这是一个常见的情况在字符串/正则表达式处理),然后你的记忆变得支离破碎,垃圾收集器就无计可施:内存被释放,但不能被重用,因为空闲块太短。
The only way to fix the problem is to save the objects you want, restart R
, and reload the objects.
解决这个问题的唯一方法是保存您想要的对象,重新启动R并重新加载对象。
Since you are doing rm(list=ls())
, i.e., you do not need any objects, you do not need to save and reload anything, so, in your case, the solution is precisely what you want to avoid - restarting R
.
因为你在做rm(list=ls()),即,您不需要任何对象,也不需要保存和重载任何东西,因此,在您的示例中,解决方案正是您希望避免的——重新启动R。
PS. Garbage collection is a highly non-trivial topic. E.g., Ruby used 5 (!) different GC algorithms over 20 years. Java GC does not suck because Sun/Oracle and IBM spent many man-years on the their respective implementations of the GC. On the other hand, R and Python have lousy GC - because no one bothered to invest the necessary man-years - and they are quite popular. That's worse-is-better for you.
垃圾收集是一个非常重要的话题。例如,Ruby在20年中使用了5(!)不同的GC算法。Java GC并不糟糕,因为Sun/Oracle和IBM在各自的GC实现上花费了多年的时间。另一方面,R和Python有糟糕的GC—因为没有人费心去投资必要的人年—它们非常流行。这是坏的就是好的。
#1
19
How do you check memory usage? Normally virtual machine allocates some chunk of memory that it uses to store its data. Some of the allocated may be unused and marked as free. What GC does is discovering data that is not referenced from anywhere else and marking corresponding chunks of memory as unused, this does not mean that this memory is released to the OS. Still from the VM perspective there's now more free memory that can be used for further computation.
如何检查内存使用情况?通常,虚拟机会分配一些用于存储数据的内存块。一些分配的可能是未使用和标记为免费的。GC所做的是发现其他地方没有引用的数据,并将相应的内存块标记为未使用,这并不意味着将该内存释放给操作系统。从VM的角度来看,现在有更多的空闲内存可以用于进一步的计算。
As others asked did you experience out of memory errors? If not then there's nothing to worry about.
有人问,你是否有过内存错误?如果没有,那就没什么好担心的了。
EDIT: This and this should be enough to understand how memory allocation and garbage collection works in R.
编辑:这个和这个应该足够理解内存分配和垃圾收集在R中是如何工作的。
From the first document:
从第一个文档:
Occasionally an attempt is made to release unused pages back to the operating system. When pages are released, a number of free nodes equal to R_MaxKeepFrac times the number of allocated nodes for each class is retained. Pages not needed to meet this requirement are released. An attempt to release pages is made every R_PageReleaseFreq level 1 or level 2 collections.
偶尔会尝试将未使用的页面释放回操作系统。在释放页面时,将保留若干个空闲节点,等于R_MaxKeepFrac乘以每个类分配的节点数量。不需要满足此要求的页面将被释放。发布页面的尝试是每个R_PageReleaseFreq级别1或2级集合。
EDIT2:
EDIT2:
To see used memory try running gc() with verbose set to TRUE:
要查看已使用的内存,请尝试运行gc(),将详细设置为TRUE:
gc(verbose=T)
Here's a result with an array of 10'000'000 integers in memory:
这里有一个内存中有10000个整数的数组:
Garbage collection 9 = 1+0+8 (level 2) ...
10.7 Mbytes of cons cells used (49%)
40.6 Mbytes of vectors used (72%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198838 10.7 407500 21.8 350000 18.7
Vcells 5311050 40.6 7421749 56.7 5311504 40.6
And here's after discarding reference to it:
这是在放弃对它的引用之后
Garbage collection 10 = 1+0+9 (level 2) ...
10.7 Mbytes of cons cells used (49%)
2.4 Mbytes of vectors used (5%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198821 10.7 407500 21.8 350000 18.7
Vcells 310987 2.4 5937399 45.3 5311504 40.6
As you can see memory used by Vcells fell from 40.6Mb to 2.4Mb.
可以看到Vcells使用的内存从40.6Mb下降到2.4Mb。
#2
25
The R
garbage collector is imperfect in the following (not so) subtle way: it does not move objects (i.e., it does not compact memory) because of the way it interacts with C
libraries. (Some other languages/implementations suffer from this too, but others, despite also having to interact with C
, manage to have a compacting generational GC which does not suffer from this problem).
R垃圾收集器在以下(并非如此)微妙的方式中是不完美的:它不会移动对象(例如。因为它与C库交互的方式)。(其他一些语言/实现也受到这个问题的影响,但是其他语言/实现尽管也必须与C进行交互,但仍然拥有一个紧凑的分代GC,不会受到这个问题的影响)。
This means that if you take turns allocating small chunks of memory which are then discarded and larger chunks for more permanent objects (this is a common situation when doing string/regexp processing), then your memory becomes fragmented and the garbage collector can do nothing about it: the memory is released, but cannot be re-used because the free chunks are too short.
这意味着如果你轮流分配小块内存,然后丢弃和较大的块更持久对象(这是一个常见的情况在字符串/正则表达式处理),然后你的记忆变得支离破碎,垃圾收集器就无计可施:内存被释放,但不能被重用,因为空闲块太短。
The only way to fix the problem is to save the objects you want, restart R
, and reload the objects.
解决这个问题的唯一方法是保存您想要的对象,重新启动R并重新加载对象。
Since you are doing rm(list=ls())
, i.e., you do not need any objects, you do not need to save and reload anything, so, in your case, the solution is precisely what you want to avoid - restarting R
.
因为你在做rm(list=ls()),即,您不需要任何对象,也不需要保存和重载任何东西,因此,在您的示例中,解决方案正是您希望避免的——重新启动R。
PS. Garbage collection is a highly non-trivial topic. E.g., Ruby used 5 (!) different GC algorithms over 20 years. Java GC does not suck because Sun/Oracle and IBM spent many man-years on the their respective implementations of the GC. On the other hand, R and Python have lousy GC - because no one bothered to invest the necessary man-years - and they are quite popular. That's worse-is-better for you.
垃圾收集是一个非常重要的话题。例如,Ruby在20年中使用了5(!)不同的GC算法。Java GC并不糟糕,因为Sun/Oracle和IBM在各自的GC实现上花费了多年的时间。另一方面,R和Python有糟糕的GC—因为没有人费心去投资必要的人年—它们非常流行。这是坏的就是好的。