When using malloc to allocate memory, is it generally quicker to do multiple mallocs of smaller chunks of data or fewer mallocs of larger chunks of data? For example, say you are working with an image file that has black pixels and white pixels. You are iterating through the pixels and want to save the x and y position of each black pixel in a new structure that also has a pointer to the next and previous pixels x and y values. Would it be generally faster to iterate through the pixels allocating a new structure for each black pixel's x and y values with the pointers, or would it be faster to get a count of the number of black pixels by iterating through once, then allocating a large chunk of memory using a structure containing just the x and y values, but no pointers, then iterating through again, saving the x and y values into that array? I'm assuming certain platforms might be different than others as to which is faster, but what does everyone think would generally be faster?
当使用malloc分配内存时,通常可以更快地执行多个较小数据块的malloc或更少的大块数据malloc吗?例如,假设您正在处理具有黑色像素和白色像素的图像文件。您正在迭代像素,并希望将每个黑色像素的x和y位置保存在一个新结构中,该结构还具有指向下一个和前一个像素x和y值的指针。迭代通过像素为每个黑色像素的x和y值分配新结构的指针通常会更快,或者通过迭代一次获得黑色像素数的计数会更快,然后分配一个大的使用仅包含x和y值但没有指针的结构的内存块,然后再次迭代,将x和y值保存到该数组中?我假设某些平台可能与其他平台不同,哪个更快,但每个人认为通常会更快?
14 个解决方案
#1
19
It depends:
- Multiple small times means multiple times, which is slower
- There may be a special/fast implementation for small allocations.
多次小时意味着多次,这是较慢的
小分配可能有特殊/快速实施。
If I cared, I'd measure it! If I really cared a lot, and couldn't guess, then I might implement both, and measure at run-time on the target machine, and adapt accordingly.
如果我关心,我会测量它!如果我真的非常关心,并且无法猜测,那么我可以实现两者,并在目标机器上的运行时进行测量,并相应地进行调整。
In general I'd assume that fewer is better: but there are size and run-time library implementations such that a (sufficiently) large allocation will be delegated to the (relatively slow) O/S. whereas a (sufficiently) small allocation will be served from a (relatively quick) already-allocated heap.
一般来说,我认为越少越好:但是有大小和运行时库实现,这样(足够)大的分配将被委托给(相对慢的)O / S.而(一个(足够的)小分配将从(相对快速)已分配的堆中提供。
#2
13
Allocating large blocks is more efficient; additionally, since you are using larger contiguous blocks, you have greater locality of reference, and traversing your in-memory structure once you've generated it should also be more efficient! Further, allocating large blocks should help to reduce memory fragmentation.
分配大块更有效率;此外,由于您使用较大的连续块,因此您具有更大的引用位置,并且在生成内存结构后遍历您的内存结构也应该更高效!此外,分配大块应该有助于减少内存碎片。
#3
4
Generally speaking, allocating larger chunks of memory fewer times will be faster. There's overhead involved each time a call to malloc() is made.
一般来说,分配更大的内存块的次数会更快。每次调用malloc()时都会产生开销。
#5
3
Allocating memory is work. The amount of work done when allocating a block of memory is typically independent of the size of the block. You work it out from here.
分配内存是有效的。分配内存块时完成的工作量通常与块的大小无关。你从这里开始工作。
#6
3
It's faster not to allocate in performance-sensitive code at all. Allocate the memory you're going to need once in advance, and then use and reuse that as much as you like.
最好不要在性能敏感的代码中进行分配。提前分配您需要的内存,然后根据需要使用和重用。
Memory allocation is a relatively slow operation in general, so don't do it more often than necessary.
内存分配通常是一个相对较慢的操作,所以不要经常这样做。
#7
2
In general malloc is expensive. It has to find an appropriate memory chunk from which to allocate memory and keep track of non-contiguous memory blocks. In several libraries you will find small memory allocators that try to minimize the impact by allocating a large block and managing the memory in the allocator.
通常malloc很贵。它必须找到一个适当的内存块,从中分配内存并跟踪非连续的内存块。在几个库中,您会发现小内存分配器试图通过分配大块和管理分配器中的内存来最小化影响。
Alexandrescu deals with the problem in 'Modern C++ Design' and in the Loki library if you want to take a look at one such libs.
Alexandrescu处理“现代C ++设计”和Loki库中的问题,如果你想看一个这样的库。
#8
2
This question is one of pragmatism, I'm afraid; that is to say, it depends.
这个问题是一种实用主义,我担心;也就是说,这取决于。
If you have a LOT of pixels, only a few of which are black then counting them might be the highest cost.
如果你有很多像素,只有少数是黑色的,那么计算它们可能是最高的成本。
If you're using C++, which your tags suggest you are, I would strongly suggest using STL, somthing like std::vector.
如果你正在使用C ++,你的标签建议你,我强烈建议使用STL,像std :: vector一样。
The implementation of vector, if I remember correctly, uses a pragmatic approach to allocation. There are a few heuristics for allocation strategies, an informative one is this:
如果我没记错的话,vector的实现使用实用的方法进行分配。分配策略有一些启发式方法,信息量如下:
class SampleVector {
int N,used,*data;
public:
SampleVector() {N=1;used=0;data=malloc(N);}
void push_back(int i)
{
if (used>=N)
{
// handle reallocation
N*=2;
data=realloc(data,N);
}
data[used++]=i;
}
};
In this case, you DOUBLE the amount of memory allocated every time you realloc. This means that reallocations progressively halve in frequency.
在这种情况下,每次重新分配时,都会分配多少内存。这意味着重新分配的频率逐渐减半。
Your STL implementation will have been well-tuned, so if you can use that, do!
您的STL实现将得到很好的调整,所以如果您可以使用它,那就行!
#9
2
Another point to consider is how this interacts with threading. Using malloc many times in a threaded concurrent application is a major drag on performance. In that environment you are better off with a scalable allocator like the one used in Intel's Thread Building Blocks or Hoard. The major limitation with malloc is that there is a single global lock that all the threads contend for. It can be so bad that adding another thread dramatically slows down your application.
另一个需要考虑的问题是它如何与线程交互。在线程并发应用程序中多次使用malloc是性能的主要阻力。在那种环境中,最好使用可扩展的分配器,如英特尔的Thread Building Blocks或Hoard中使用的分配器。 malloc的主要限制是所有线程都争用一个全局锁。添加另一个线程可能会非常糟糕,从而大大降低了应用程序的速度。
#10
1
As already mentonned, malloc is costly, so fewer will probably be faster. Also, working with the pixels, on most platforms will have less cache-misses and will be faster. However, there is no guarantee on every platforms
正如已经提到的那样,malloc成本很高,因此可能会更快。此外,在大多数平台上使用像素将减少缓存丢失并且速度更快。但是,并不能保证每个平台都有
#11
1
Next to the allocation overhead itself, allocating multiple small chunks may result in lots of cache misses, while if you can iterate through a contiguous block, chances are better.
在分配开销本身旁边,分配多个小块可能会导致大量缓存未命中,而如果您可以遍历连续块,则可能性更好。
The scenario you describe asks for preallocation of a large block, imho.
您描述的场景要求预先分配大块,imho。
#12
1
Although allocating large blocks is faster per byte of allocated memory, it will probably not be faster if you artificially increase the allocation size only to chop it up yourself. You're are just duplicating the memory management.
虽然分配大块的每个字节分配内存的速度更快,但如果你人为增加分配大小只是为了自己切断它,它可能不会更快。你只是复制内存管理。
#13
1
"I can allocate-it-all" (really, I can!)
“我可以分配全部”(真的,我可以!)
We can philosophy about some special implementations, that speed up small allocations considerably ... yes! But in general this holds:
我们可以理解一些特殊的实现,大大加快了小的分配......是的!但总的来说这有:
malloc must be general. It must implement all different kinds of allocations. That is the reason it is considerably slow! It might be, that you use a special kinky-super-duper Library, that speeds things up, but also those can not do wonders, since they have to implement malloc in its full spectrum.
malloc必须是通用的。它必须实现所有不同类型的分配。这就是它相当慢的原因!可能是,你使用了一个特殊的kinky-super-duper库,它可以加快速度,但也不会创造奇迹,因为它们必须在其全部范围内实现malloc。
The rule is, when you have more specialized allocation coding, you are always faster then the broad "I can allocate-it-all" routine "malloc".
规则是,当你有更专业的分配编码时,你总是比广泛的“我可以分配全部”例程“malloc”更快。
So when you are able to allocate the memory in bigger blocks in your coding (and it does not cost you to much) you can speed up things considerably. Also - as mentioned by others - you will get lot less fragmentation of memory, that also speeds things up and can cost less memory. You must also see, that malloc needs additional memory for every chunk of memory it returns to you (yes, special routines can reduce this ... but you don't know! what it does really unless you implemented it yourself or bought some wonder-library).
因此,当您能够在编码中以更大的块分配内存时(并且不会花费太多成本),您可以大大加快速度。另外 - 正如其他人所提到的那样 - 你将获得更少的内存碎片,这也会加快速度并降低内存成本。你还必须看到,malloc需要为它返回给你的每一块内存需要额外的内存(是的,特殊的例程可以减少这个......但是你不知道!除非你自己实现它或者买了一些奇迹-图书馆)。
#14
1
Do an iteration over the pixels to count the number of them to be stored. Then allocate an array for the exact number of items. This is the most efficient solution.
对像素进行迭代以计算要存储的数量。然后为确切的项目数分配一个数组。这是最有效的解决方案。
You can use std::vector for easier memory management (see the std::vector::reserve procedure). Note: reserve will allocate probably a little (probably up to 2 times) more memory then necessary.
您可以使用std :: vector来更轻松地进行内存管理(请参阅std :: vector :: reserve过程)。注意:预留将分配可能需要的内存(可能多达2倍)。
#1
19
It depends:
- Multiple small times means multiple times, which is slower
- There may be a special/fast implementation for small allocations.
多次小时意味着多次,这是较慢的
小分配可能有特殊/快速实施。
If I cared, I'd measure it! If I really cared a lot, and couldn't guess, then I might implement both, and measure at run-time on the target machine, and adapt accordingly.
如果我关心,我会测量它!如果我真的非常关心,并且无法猜测,那么我可以实现两者,并在目标机器上的运行时进行测量,并相应地进行调整。
In general I'd assume that fewer is better: but there are size and run-time library implementations such that a (sufficiently) large allocation will be delegated to the (relatively slow) O/S. whereas a (sufficiently) small allocation will be served from a (relatively quick) already-allocated heap.
一般来说,我认为越少越好:但是有大小和运行时库实现,这样(足够)大的分配将被委托给(相对慢的)O / S.而(一个(足够的)小分配将从(相对快速)已分配的堆中提供。
#2
13
Allocating large blocks is more efficient; additionally, since you are using larger contiguous blocks, you have greater locality of reference, and traversing your in-memory structure once you've generated it should also be more efficient! Further, allocating large blocks should help to reduce memory fragmentation.
分配大块更有效率;此外,由于您使用较大的连续块,因此您具有更大的引用位置,并且在生成内存结构后遍历您的内存结构也应该更高效!此外,分配大块应该有助于减少内存碎片。
#3
4
Generally speaking, allocating larger chunks of memory fewer times will be faster. There's overhead involved each time a call to malloc() is made.
一般来说,分配更大的内存块的次数会更快。每次调用malloc()时都会产生开销。
#4
#5
3
Allocating memory is work. The amount of work done when allocating a block of memory is typically independent of the size of the block. You work it out from here.
分配内存是有效的。分配内存块时完成的工作量通常与块的大小无关。你从这里开始工作。
#6
3
It's faster not to allocate in performance-sensitive code at all. Allocate the memory you're going to need once in advance, and then use and reuse that as much as you like.
最好不要在性能敏感的代码中进行分配。提前分配您需要的内存,然后根据需要使用和重用。
Memory allocation is a relatively slow operation in general, so don't do it more often than necessary.
内存分配通常是一个相对较慢的操作,所以不要经常这样做。
#7
2
In general malloc is expensive. It has to find an appropriate memory chunk from which to allocate memory and keep track of non-contiguous memory blocks. In several libraries you will find small memory allocators that try to minimize the impact by allocating a large block and managing the memory in the allocator.
通常malloc很贵。它必须找到一个适当的内存块,从中分配内存并跟踪非连续的内存块。在几个库中,您会发现小内存分配器试图通过分配大块和管理分配器中的内存来最小化影响。
Alexandrescu deals with the problem in 'Modern C++ Design' and in the Loki library if you want to take a look at one such libs.
Alexandrescu处理“现代C ++设计”和Loki库中的问题,如果你想看一个这样的库。
#8
2
This question is one of pragmatism, I'm afraid; that is to say, it depends.
这个问题是一种实用主义,我担心;也就是说,这取决于。
If you have a LOT of pixels, only a few of which are black then counting them might be the highest cost.
如果你有很多像素,只有少数是黑色的,那么计算它们可能是最高的成本。
If you're using C++, which your tags suggest you are, I would strongly suggest using STL, somthing like std::vector.
如果你正在使用C ++,你的标签建议你,我强烈建议使用STL,像std :: vector一样。
The implementation of vector, if I remember correctly, uses a pragmatic approach to allocation. There are a few heuristics for allocation strategies, an informative one is this:
如果我没记错的话,vector的实现使用实用的方法进行分配。分配策略有一些启发式方法,信息量如下:
class SampleVector {
int N,used,*data;
public:
SampleVector() {N=1;used=0;data=malloc(N);}
void push_back(int i)
{
if (used>=N)
{
// handle reallocation
N*=2;
data=realloc(data,N);
}
data[used++]=i;
}
};
In this case, you DOUBLE the amount of memory allocated every time you realloc. This means that reallocations progressively halve in frequency.
在这种情况下,每次重新分配时,都会分配多少内存。这意味着重新分配的频率逐渐减半。
Your STL implementation will have been well-tuned, so if you can use that, do!
您的STL实现将得到很好的调整,所以如果您可以使用它,那就行!
#9
2
Another point to consider is how this interacts with threading. Using malloc many times in a threaded concurrent application is a major drag on performance. In that environment you are better off with a scalable allocator like the one used in Intel's Thread Building Blocks or Hoard. The major limitation with malloc is that there is a single global lock that all the threads contend for. It can be so bad that adding another thread dramatically slows down your application.
另一个需要考虑的问题是它如何与线程交互。在线程并发应用程序中多次使用malloc是性能的主要阻力。在那种环境中,最好使用可扩展的分配器,如英特尔的Thread Building Blocks或Hoard中使用的分配器。 malloc的主要限制是所有线程都争用一个全局锁。添加另一个线程可能会非常糟糕,从而大大降低了应用程序的速度。
#10
1
As already mentonned, malloc is costly, so fewer will probably be faster. Also, working with the pixels, on most platforms will have less cache-misses and will be faster. However, there is no guarantee on every platforms
正如已经提到的那样,malloc成本很高,因此可能会更快。此外,在大多数平台上使用像素将减少缓存丢失并且速度更快。但是,并不能保证每个平台都有
#11
1
Next to the allocation overhead itself, allocating multiple small chunks may result in lots of cache misses, while if you can iterate through a contiguous block, chances are better.
在分配开销本身旁边,分配多个小块可能会导致大量缓存未命中,而如果您可以遍历连续块,则可能性更好。
The scenario you describe asks for preallocation of a large block, imho.
您描述的场景要求预先分配大块,imho。
#12
1
Although allocating large blocks is faster per byte of allocated memory, it will probably not be faster if you artificially increase the allocation size only to chop it up yourself. You're are just duplicating the memory management.
虽然分配大块的每个字节分配内存的速度更快,但如果你人为增加分配大小只是为了自己切断它,它可能不会更快。你只是复制内存管理。
#13
1
"I can allocate-it-all" (really, I can!)
“我可以分配全部”(真的,我可以!)
We can philosophy about some special implementations, that speed up small allocations considerably ... yes! But in general this holds:
我们可以理解一些特殊的实现,大大加快了小的分配......是的!但总的来说这有:
malloc must be general. It must implement all different kinds of allocations. That is the reason it is considerably slow! It might be, that you use a special kinky-super-duper Library, that speeds things up, but also those can not do wonders, since they have to implement malloc in its full spectrum.
malloc必须是通用的。它必须实现所有不同类型的分配。这就是它相当慢的原因!可能是,你使用了一个特殊的kinky-super-duper库,它可以加快速度,但也不会创造奇迹,因为它们必须在其全部范围内实现malloc。
The rule is, when you have more specialized allocation coding, you are always faster then the broad "I can allocate-it-all" routine "malloc".
规则是,当你有更专业的分配编码时,你总是比广泛的“我可以分配全部”例程“malloc”更快。
So when you are able to allocate the memory in bigger blocks in your coding (and it does not cost you to much) you can speed up things considerably. Also - as mentioned by others - you will get lot less fragmentation of memory, that also speeds things up and can cost less memory. You must also see, that malloc needs additional memory for every chunk of memory it returns to you (yes, special routines can reduce this ... but you don't know! what it does really unless you implemented it yourself or bought some wonder-library).
因此,当您能够在编码中以更大的块分配内存时(并且不会花费太多成本),您可以大大加快速度。另外 - 正如其他人所提到的那样 - 你将获得更少的内存碎片,这也会加快速度并降低内存成本。你还必须看到,malloc需要为它返回给你的每一块内存需要额外的内存(是的,特殊的例程可以减少这个......但是你不知道!除非你自己实现它或者买了一些奇迹-图书馆)。
#14
1
Do an iteration over the pixels to count the number of them to be stored. Then allocate an array for the exact number of items. This is the most efficient solution.
对像素进行迭代以计算要存储的数量。然后为确切的项目数分配一个数组。这是最有效的解决方案。
You can use std::vector for easier memory management (see the std::vector::reserve procedure). Note: reserve will allocate probably a little (probably up to 2 times) more memory then necessary.
您可以使用std :: vector来更轻松地进行内存管理(请参阅std :: vector :: reserve过程)。注意:预留将分配可能需要的内存(可能多达2倍)。