If you are developing a memory intensive application in C++ on Windows, do you opt to write your own custom memory manager to allocate memory from virtual address space or do you allow CRT to take control and do the memory management for you ? I am especially concerned about the fragmentation caused by the allocation and deallocation of small objects on heap. Because of this, I think the process will run out of memory eventhough there is enough memory but is fragmented.
如果您正在Windows上使用c++开发一个内存密集型应用程序,您是选择编写自己的自定义内存管理器来从虚拟地址空间分配内存,还是允许CRT控制并为您进行内存管理?我特别关注堆上小对象的分配和释放引起的碎片化。正因为如此,我认为进程将耗尽内存,即使有足够的内存,但却是碎片化的。
9 个解决方案
#1
38
I think your best bet is to not implement one until profiles prove that the CRT is fragmenting memory in a way that damages the performance of your application. CRT, core OS, and STL guys spend a lot of time thinking about memory management.
我认为最好的办法是在概要文件证明CRT正在以损害应用程序性能的方式分割内存之前不实现它。CRT, core OS和STL的人花了很多时间思考内存管理。
There's a good chance that your code will perform quite fine under existing allocators with no changes needed. There's certainly a better chance of that, than there is of you getting a memory allocator right the first time. I've written memory allocators before for similar circumstances and it's a monsterous task to take on. Not so suprisingly, the version I inherited was rife with fragmentation problems.
很有可能您的代码在现有的分配程序下执行得很好,不需要任何更改。这当然有更好的机会,比你第一次得到内存分配器更好。在类似的情况下,我曾经写过内存分配程序,这是一项繁重的任务。不那么令人惊讶的是,我继承的版本中充斥着碎片化问题。
The other advantage of waiting until a profile shows it's a problem is that you will also know if you've actually fixed anything. That's the most important part of a performance fix.
等待一个概要文件显示它是一个问题的另一个好处是,您还将知道您是否真的修复了任何东西。这是性能修复中最重要的部分。
As long as you're using standard collection classes an algorihtmns (such as STL/BOOST) it shouldn't be very hard to plug in a new allocator later on in the cycle to fix the portions of your code base that do need to be fixed. It's very unlikely that you will need a hand coded allocator for your entire program.
只要您使用的是标准集合类algorihtmns(例如STL/BOOST),那么在以后的循环中插入一个新的分配器来修复需要修复的代码基的部分应该不会很难。对于整个程序来说,不太可能需要手工编码的分配器。
#2
4
Although most of you indicate that you shouldn't write your own memory manager, it could still be useful if:
虽然大多数人都表示不应该编写自己的内存管理器,但如果:
- you have a specific requirement or situation in which you are sure you can write a faster version
- 您有一个特定的需求或情况,您确信可以编写一个更快的版本
- you want to write you own memory-overwrite logic (to help in debugging)
- 您希望编写自己的内存覆盖逻辑(以帮助调试)
- you want to keep track of the places where memory is leaked
- 您需要跟踪内存泄漏的位置
If you want to write your own memory manager, it's important to split it in the following 4 parts:
如果您想编写自己的内存管理器,请将其分为以下4部分:
- a part that 'intercepts' the calls to malloc/free (C) and new/delete (C++). This is quite easy for new/delete (just global new and delete operators), but also for malloc/free this is possible ('overwrite' the functions of the CRT, redefine calls to malloc/free, ...)
- “拦截”对malloc/free (C)和new/delete (c++)的调用的部分。这对于new/delete(只是全局新建和删除操作符)非常简单,但是对于malloc/free也可以(“覆盖”CRT的函数,重新定义对malloc/free的调用,…)
- a part that represents the entry point of your memory manager, and which is called by the 'interceptor' part
- 表示内存管理器入口点的部分,由“拦截器”部分调用
- a part that implements the actual memory manager. Possibly you will have multiple implementations of this (depending on the situation)
- 实现实际内存管理器的部分。可能会有多种实现(取决于具体情况)
- a part that 'decorates' the allocated memory with information of the call stack, overwrite-zones (aka red zones), ...
- 用调用堆栈、重写区域(又称红色区域)、…
If these 4 parts are clearly separated, it also becomes easy to replace one part by another, or add a new part to it e.g.:
如果这4个部分清晰地分开,也很容易被另一个部分取代,或者添加一个新的部分,例如:
- add the memory manager implementation of Intel Tread Building Blocks library (to part 3)
- 添加内存管理器实现Intel胎面构建块库(第3部分)
- modify part 1 to support a new version of the compiler, a new platform or a totally new compiler
- 修改第1部分以支持编译器的新版本、新平台或全新的编译器
Having written a memory manager myself, I can only indicate that it can be really handy having an easy way to extend your own memory manager. E.g. what I regularly have to do is finding memory leaks in long-running server applications. With my own memory manager I do it like this:
在我自己编写了一个内存管理器之后,我只能指出,使用一种简单的方法扩展自己的内存管理器非常方便。我通常要做的就是在长时间运行的服务器应用程序中查找内存泄漏。使用我自己的内存管理器,我这样做:
- start the application and let it 'warm up' for a while
- 启动应用程序,让它“热身”一段时间
- ask your own memory manager to dump an overview of the used memory, including the call stacks at the moment of the call
- 请您自己的内存管理器转储已使用内存的概述,包括调用时的调用堆栈
- continue running the application
- 继续运行应用程序
- make a second dump
- 让第二个转储
- sort the two dumps alphabetically on call stack
- 在调用堆栈上按字母顺序对两个转储进行排序
- look up the differences
- 查找差异
Although you can do similar things with out-of-the-box components, they tend to have some disadvantages:
尽管您可以用开箱即用的组件做类似的事情,但它们往往有一些缺点:
- often they seriously slow down the application
- 通常他们会严重地降低应用程序的速度
- often they can only report leaks at the end of the application, not while the application is running
- 通常,他们只能在应用程序的末尾报告泄漏,而不能在应用程序运行时报告泄漏
But, also try to be realistic: if you don't have a problem with memory fragmentation, performance, memory leaks or memory overwrites, there's no real reason to write your own memory manager.
但是,也要尽量现实一点:如果没有内存碎片、性能、内存泄漏或内存溢出问题,就没有真正的理由编写自己的内存管理器。
#3
2
Was it SmartHeap from MicroQuill?
这是来自微箭的SmartHeap吗?
#4
2
There used to be excellent third-party drop-in heap replacement library for VC++, but I don't remember the name any more. Our app got 30% speed-up when we started using it.
过去有很好的第三方drop-in堆替换库用于vc++,但是我已经不记得它的名字了。当我们开始使用它时,我们的应用程序速度提高了30%。
Edit: it's SmartHeap - thanks, ChrisW
编辑:我是SmartHeap -谢谢,ChrisW
#5
2
From my experience, fragmentation is mostly a problem when you are continuously allocating and freeing large buffers (like over 16k) since those are the ones that will ultimately cause an out of memory, if the heap cannot find a big enough spot for one of them.
根据我的经验,当您不断地分配和释放大型缓冲区(比如超过16k)时,碎片化通常是一个问题,因为如果堆找不到足够大的缓冲区,那么这些缓冲区将最终导致内存不足。
In that case, only those objects should have special memory management, keep the rest simple. You can use buffer reusing if they always have the same size, or more complex memory pooling if they vary in size.
在这种情况下,只有那些对象应该具有特殊的内存管理,保持其他对象的简单性。您可以使用缓冲区重用,如果它们总是具有相同的大小,或者更复杂的内存池,如果它们的大小不同。
The default heap implementations shouldn't have any problem finding some place for smaller buffers between previous allocations.
默认的堆实现不应该有任何问题,在以前的分配之间找到一个较小缓冲区的位置。
#6
1
you opt to write your own custom memory manager to allocate memory from virtual address space or do you allow CRT to take control and do the memory management for you?
您选择编写自己的自定义内存管理器来从虚拟地址空间分配内存,还是允许CRT控制并为您进行内存管理?
The standard library is often good enough. If it isn't then, instead of replacing it, a smaller step is to override operator new
and operator delete
for specific classes, not for all classes.
标准的图书馆通常足够好。如果不是替换它,更小的步骤是重写操作符new和操作符删除特定的类,而不是所有类。
#7
1
It depends very much on your memory allocation patterns. From my personal experience there are generally one or two classes in a project that needs special considerations when it comes to memory management because they are used frequently in the part of the code where you spend lots of time. There might also be classes that in some particular context needs special treatment, but in other contexts can be used without bothering about it.
这在很大程度上取决于内存分配模式。根据我的个人经验,在一个项目中,通常有一两个类需要特别注意内存管理,因为它们经常用于代码中需要花费大量时间的部分。在某些特定的上下文中,也可能存在需要特殊处理的类,但是在其他上下文中,可以不需要担心地使用它们。
I often end up managing those kind of objects in a std::vector or something similar and explicit rather than overriding the allocation routines for the class. For many situations the heap is really overkill and the allocation patterns are so predictable that you don't need to allocate on the heap but in some much simpler structure that allocates larger pages from the heap that has less bookkeeping overhead than allocating every single instance on the heap.
我经常在std::vector或类似的、显式的东西中管理这些对象,而不是覆盖类的分配例程。在许多情况下,堆实际上是多余的,而且分配模式是可预测的,您不需要在堆上进行分配,而是在一些更简单的结构中,从堆中分配更大的页面,与分配堆上的每个实例相比,这些页面的簿记开销更少。
These are some general things to think about:
以下是一些需要考虑的一般性问题:
First, small objects that's allocated and destroyed quickly should be put on the stack. The fastest allocation are the ones that are never done. Stack allocation is also done without any locking of a global heap which is good for multi threaded code. Allocating on the heap in c/c++ can be relatively expensive compared to GC languages like java so try to avoid it unless you need it.
首先,快速分配和销毁的小对象应该放在堆栈上。最快的分配是那些从来没有做过的。堆栈分配也没有任何全局堆的锁定,这对多线程代码很好。与java等GC语言相比,在c/c++ +中分配堆可能比较昂贵,所以尽量避免这样做,除非您需要它。
If you do a lot of allocation you should be careful with threading performance. A classic pitfall is string classes that tends to do alot of allocation hidden to the user. If you do lots of string processing in multiple threads, they might end up fighting about a mutex in the heap code. For this purpose, taking control of the memory management can speed up things alot. Switching to another heap implementation is generally not the solution here since the heap will still be global and your threads will fight about it. I think google has a heap that should be faster in multithreaded environments though. Haven't tried it myself.
如果您做了大量的分配,那么您应该注意线程性能。一个典型的陷阱是字符串类,它们倾向于对用户进行大量隐藏的分配。如果您在多个线程中进行大量的字符串处理,那么它们最终可能会在堆代码中为互斥而争吵。出于这个目的,控制内存管理可以大大加快速度。切换到另一个堆实现通常不是这里的解决方案,因为堆仍然是全局的,您的线程将为此而争论。我认为谷歌的堆在多线程环境中应该更快。没有试过自己。
#8
0
no, I would not.
不,我不会。
The chances of me writing a better code then the CRT with the who know how many hundreds of man year invested in it are slim.
我写一个比CRT更好的代码的机会是很小的。
I would search for a specialized library instead of reinventing the wheel.
我将寻找一个专门的图书馆而不是重新发明*。
#9
0
there's a solution used by some open source software like doxygen, the idea is to store some instances into file when you exceeds a specific amount of memory. And after get from file your data when you need them.
一些开源软件如doxygen所使用的解决方案是,当你超过特定的内存时,这个想法就是将一些实例存储到文件中。当你需要数据时,从文件中获取数据。
#1
38
I think your best bet is to not implement one until profiles prove that the CRT is fragmenting memory in a way that damages the performance of your application. CRT, core OS, and STL guys spend a lot of time thinking about memory management.
我认为最好的办法是在概要文件证明CRT正在以损害应用程序性能的方式分割内存之前不实现它。CRT, core OS和STL的人花了很多时间思考内存管理。
There's a good chance that your code will perform quite fine under existing allocators with no changes needed. There's certainly a better chance of that, than there is of you getting a memory allocator right the first time. I've written memory allocators before for similar circumstances and it's a monsterous task to take on. Not so suprisingly, the version I inherited was rife with fragmentation problems.
很有可能您的代码在现有的分配程序下执行得很好,不需要任何更改。这当然有更好的机会,比你第一次得到内存分配器更好。在类似的情况下,我曾经写过内存分配程序,这是一项繁重的任务。不那么令人惊讶的是,我继承的版本中充斥着碎片化问题。
The other advantage of waiting until a profile shows it's a problem is that you will also know if you've actually fixed anything. That's the most important part of a performance fix.
等待一个概要文件显示它是一个问题的另一个好处是,您还将知道您是否真的修复了任何东西。这是性能修复中最重要的部分。
As long as you're using standard collection classes an algorihtmns (such as STL/BOOST) it shouldn't be very hard to plug in a new allocator later on in the cycle to fix the portions of your code base that do need to be fixed. It's very unlikely that you will need a hand coded allocator for your entire program.
只要您使用的是标准集合类algorihtmns(例如STL/BOOST),那么在以后的循环中插入一个新的分配器来修复需要修复的代码基的部分应该不会很难。对于整个程序来说,不太可能需要手工编码的分配器。
#2
4
Although most of you indicate that you shouldn't write your own memory manager, it could still be useful if:
虽然大多数人都表示不应该编写自己的内存管理器,但如果:
- you have a specific requirement or situation in which you are sure you can write a faster version
- 您有一个特定的需求或情况,您确信可以编写一个更快的版本
- you want to write you own memory-overwrite logic (to help in debugging)
- 您希望编写自己的内存覆盖逻辑(以帮助调试)
- you want to keep track of the places where memory is leaked
- 您需要跟踪内存泄漏的位置
If you want to write your own memory manager, it's important to split it in the following 4 parts:
如果您想编写自己的内存管理器,请将其分为以下4部分:
- a part that 'intercepts' the calls to malloc/free (C) and new/delete (C++). This is quite easy for new/delete (just global new and delete operators), but also for malloc/free this is possible ('overwrite' the functions of the CRT, redefine calls to malloc/free, ...)
- “拦截”对malloc/free (C)和new/delete (c++)的调用的部分。这对于new/delete(只是全局新建和删除操作符)非常简单,但是对于malloc/free也可以(“覆盖”CRT的函数,重新定义对malloc/free的调用,…)
- a part that represents the entry point of your memory manager, and which is called by the 'interceptor' part
- 表示内存管理器入口点的部分,由“拦截器”部分调用
- a part that implements the actual memory manager. Possibly you will have multiple implementations of this (depending on the situation)
- 实现实际内存管理器的部分。可能会有多种实现(取决于具体情况)
- a part that 'decorates' the allocated memory with information of the call stack, overwrite-zones (aka red zones), ...
- 用调用堆栈、重写区域(又称红色区域)、…
If these 4 parts are clearly separated, it also becomes easy to replace one part by another, or add a new part to it e.g.:
如果这4个部分清晰地分开,也很容易被另一个部分取代,或者添加一个新的部分,例如:
- add the memory manager implementation of Intel Tread Building Blocks library (to part 3)
- 添加内存管理器实现Intel胎面构建块库(第3部分)
- modify part 1 to support a new version of the compiler, a new platform or a totally new compiler
- 修改第1部分以支持编译器的新版本、新平台或全新的编译器
Having written a memory manager myself, I can only indicate that it can be really handy having an easy way to extend your own memory manager. E.g. what I regularly have to do is finding memory leaks in long-running server applications. With my own memory manager I do it like this:
在我自己编写了一个内存管理器之后,我只能指出,使用一种简单的方法扩展自己的内存管理器非常方便。我通常要做的就是在长时间运行的服务器应用程序中查找内存泄漏。使用我自己的内存管理器,我这样做:
- start the application and let it 'warm up' for a while
- 启动应用程序,让它“热身”一段时间
- ask your own memory manager to dump an overview of the used memory, including the call stacks at the moment of the call
- 请您自己的内存管理器转储已使用内存的概述,包括调用时的调用堆栈
- continue running the application
- 继续运行应用程序
- make a second dump
- 让第二个转储
- sort the two dumps alphabetically on call stack
- 在调用堆栈上按字母顺序对两个转储进行排序
- look up the differences
- 查找差异
Although you can do similar things with out-of-the-box components, they tend to have some disadvantages:
尽管您可以用开箱即用的组件做类似的事情,但它们往往有一些缺点:
- often they seriously slow down the application
- 通常他们会严重地降低应用程序的速度
- often they can only report leaks at the end of the application, not while the application is running
- 通常,他们只能在应用程序的末尾报告泄漏,而不能在应用程序运行时报告泄漏
But, also try to be realistic: if you don't have a problem with memory fragmentation, performance, memory leaks or memory overwrites, there's no real reason to write your own memory manager.
但是,也要尽量现实一点:如果没有内存碎片、性能、内存泄漏或内存溢出问题,就没有真正的理由编写自己的内存管理器。
#3
2
Was it SmartHeap from MicroQuill?
这是来自微箭的SmartHeap吗?
#4
2
There used to be excellent third-party drop-in heap replacement library for VC++, but I don't remember the name any more. Our app got 30% speed-up when we started using it.
过去有很好的第三方drop-in堆替换库用于vc++,但是我已经不记得它的名字了。当我们开始使用它时,我们的应用程序速度提高了30%。
Edit: it's SmartHeap - thanks, ChrisW
编辑:我是SmartHeap -谢谢,ChrisW
#5
2
From my experience, fragmentation is mostly a problem when you are continuously allocating and freeing large buffers (like over 16k) since those are the ones that will ultimately cause an out of memory, if the heap cannot find a big enough spot for one of them.
根据我的经验,当您不断地分配和释放大型缓冲区(比如超过16k)时,碎片化通常是一个问题,因为如果堆找不到足够大的缓冲区,那么这些缓冲区将最终导致内存不足。
In that case, only those objects should have special memory management, keep the rest simple. You can use buffer reusing if they always have the same size, or more complex memory pooling if they vary in size.
在这种情况下,只有那些对象应该具有特殊的内存管理,保持其他对象的简单性。您可以使用缓冲区重用,如果它们总是具有相同的大小,或者更复杂的内存池,如果它们的大小不同。
The default heap implementations shouldn't have any problem finding some place for smaller buffers between previous allocations.
默认的堆实现不应该有任何问题,在以前的分配之间找到一个较小缓冲区的位置。
#6
1
you opt to write your own custom memory manager to allocate memory from virtual address space or do you allow CRT to take control and do the memory management for you?
您选择编写自己的自定义内存管理器来从虚拟地址空间分配内存,还是允许CRT控制并为您进行内存管理?
The standard library is often good enough. If it isn't then, instead of replacing it, a smaller step is to override operator new
and operator delete
for specific classes, not for all classes.
标准的图书馆通常足够好。如果不是替换它,更小的步骤是重写操作符new和操作符删除特定的类,而不是所有类。
#7
1
It depends very much on your memory allocation patterns. From my personal experience there are generally one or two classes in a project that needs special considerations when it comes to memory management because they are used frequently in the part of the code where you spend lots of time. There might also be classes that in some particular context needs special treatment, but in other contexts can be used without bothering about it.
这在很大程度上取决于内存分配模式。根据我的个人经验,在一个项目中,通常有一两个类需要特别注意内存管理,因为它们经常用于代码中需要花费大量时间的部分。在某些特定的上下文中,也可能存在需要特殊处理的类,但是在其他上下文中,可以不需要担心地使用它们。
I often end up managing those kind of objects in a std::vector or something similar and explicit rather than overriding the allocation routines for the class. For many situations the heap is really overkill and the allocation patterns are so predictable that you don't need to allocate on the heap but in some much simpler structure that allocates larger pages from the heap that has less bookkeeping overhead than allocating every single instance on the heap.
我经常在std::vector或类似的、显式的东西中管理这些对象,而不是覆盖类的分配例程。在许多情况下,堆实际上是多余的,而且分配模式是可预测的,您不需要在堆上进行分配,而是在一些更简单的结构中,从堆中分配更大的页面,与分配堆上的每个实例相比,这些页面的簿记开销更少。
These are some general things to think about:
以下是一些需要考虑的一般性问题:
First, small objects that's allocated and destroyed quickly should be put on the stack. The fastest allocation are the ones that are never done. Stack allocation is also done without any locking of a global heap which is good for multi threaded code. Allocating on the heap in c/c++ can be relatively expensive compared to GC languages like java so try to avoid it unless you need it.
首先,快速分配和销毁的小对象应该放在堆栈上。最快的分配是那些从来没有做过的。堆栈分配也没有任何全局堆的锁定,这对多线程代码很好。与java等GC语言相比,在c/c++ +中分配堆可能比较昂贵,所以尽量避免这样做,除非您需要它。
If you do a lot of allocation you should be careful with threading performance. A classic pitfall is string classes that tends to do alot of allocation hidden to the user. If you do lots of string processing in multiple threads, they might end up fighting about a mutex in the heap code. For this purpose, taking control of the memory management can speed up things alot. Switching to another heap implementation is generally not the solution here since the heap will still be global and your threads will fight about it. I think google has a heap that should be faster in multithreaded environments though. Haven't tried it myself.
如果您做了大量的分配,那么您应该注意线程性能。一个典型的陷阱是字符串类,它们倾向于对用户进行大量隐藏的分配。如果您在多个线程中进行大量的字符串处理,那么它们最终可能会在堆代码中为互斥而争吵。出于这个目的,控制内存管理可以大大加快速度。切换到另一个堆实现通常不是这里的解决方案,因为堆仍然是全局的,您的线程将为此而争论。我认为谷歌的堆在多线程环境中应该更快。没有试过自己。
#8
0
no, I would not.
不,我不会。
The chances of me writing a better code then the CRT with the who know how many hundreds of man year invested in it are slim.
我写一个比CRT更好的代码的机会是很小的。
I would search for a specialized library instead of reinventing the wheel.
我将寻找一个专门的图书馆而不是重新发明*。
#9
0
there's a solution used by some open source software like doxygen, the idea is to store some instances into file when you exceeds a specific amount of memory. And after get from file your data when you need them.
一些开源软件如doxygen所使用的解决方案是,当你超过特定的内存时,这个想法就是将一些实例存储到文件中。当你需要数据时,从文件中获取数据。