为什么malloc真的不确定? (的Linux / Unix)

时间:2022-10-26 03:19:12

malloc is not guaranteed to return 0'ed memory. The conventional wisdom is not only that, but that the contents of the memory malloc returns are actually non-deterministic, e.g. openssl used them for extra randomness.

malloc不保证返回0'ed内存。传统观点不仅如此,而且内存malloc返回的内容实际上是非确定性的,例如, openssl使用它们来增加随机性。

However, as far as I know, malloc is built on top of brk/sbrk, which do "return" 0'ed memory. I can see why the contents of what malloc returns may be non-0, e.g. from previously free'd memory, but why would they be non-deterministic in "normal" single-threaded software?

但是,据我所知,malloc建立在brk / sbrk之上,它可以“返回”0'ed内存。我可以看出为什么malloc返回的内容可能不是0,例如从以前的免费内存,但为什么它们在“普通”单线程软件中是不确定的?

  1. Is the conventional wisdom really true (assuming the same binary and libraries)
  2. 传统智慧是否真实(假设相同的二进制和库)

  3. If so, Why?
  4. 如果是这样,为什么?

Edit Several people answered explaining why the memory can be non-0, which I already explained in the question above. What I'm asking is why the program using the contents of what malloc returns may be non-deterministic, i.e. why it could have different behavior every time it's run (assuming the same binary and libraries). Non-deterministic behavior is not implied by non-0's. To put it differently: why it could have different contents every time the binary is run.

编辑有几个人回答说明为什么内存可以是非0,我在上面的问题中已经解释过了。我问的是为什么使用malloc返回的内容的程序可能是非确定性的,也就是为什么它每次运行时都会有不同的行为(假设相同的二进制文件和库)。非0表示非确定性行为。换句话说:为什么每次运行二进制文件时它都有不同的内容。

10 个解决方案

#1


2  

I think that the assumption that it is non-deterministic is plain wrong, particularly as you ask for a non-threaded context. (In a threaded context due to scheduling alea you could have some non-determinism).

我认为它是非确定性的假设是完全错误的,特别是当你要求一个非线程的上下文时。 (由于调度问题,在线程上下文中,您可能会有一些非确定性)。

Just try it out. Create a sequential, deterministic application that

试一试吧。创建一个顺序的,确定性的应用程序

  • does a whole bunch of allocations
  • 做了一大堆分配

  • fills the memory with some pattern, eg fill it with the value of a counter
  • 用某种模式填充内存,例如用计数器的值填充它

  • free every second of these allocations
  • 释放每一分配

  • newly allocate the same amount
  • 新分配相同的金额

  • run through these new allocations and register the value of the first byte in a file (as textual numbers one per line)
  • 运行这些新分配并在文件中注册第一个字节的值(每行一个文本数字)

run this program twice and register the result in two different files. My idea is that these files will be identical.

运行此程序两次,并将结果注册到两个不同的文件中。我的想法是这些文件是相同的。

#2


11  

Malloc does not guarantee unpredictability... it just doesn't guarantee predictability.

Malloc不保证不可预测性......它只是不能保证可预测性。

E.g. Consider that

例如。考虑一下

 return 0;

Is a valid implementation of malloc.

是malloc的有效实现。

#3


4  

The initial values of memory returned by malloc are unspecified, which means that the specifications of the C and C++ languages put no restrictions on what values can be handed back. This makes the language easier to implement on a variety of platforms. While it might be true that in Linux malloc is implemented with brk and sbrk and the memory should be zeroed (I'm not even sure that this is necessarily true, by the way), on other platforms, perhaps an embedded platform, there's no reason that this would have to be the case. For example, an embedded device might not want to zero the memory, since doing so costs CPU cycles and thus power and time. Also, in the interest of efficiency, for example, the memory allocator could recycle blocks that had previously been freed without zeroing them out first. This means that even if the memory from the OS is initially zeroed out, the memory from malloc needn't be.

malloc返回的内存的初始值未指定,这意味着C和C ++语言的规范对可以返回的值没有限制。这使得该语言更易于在各种平台上实现。虽然可能确实在Linux中malloc是用brk和sbrk实现的,并且内存应该归零(我甚至不确定这是否一定是真的,顺便说一下),在其他平台上,也许是嵌入式平台,没有原因是必须如此。例如,嵌入式设备可能不希望将内存归零,因为这样做会花费CPU周期,从而耗费功率和时间。此外,为了提高效率,例如,内存分配器可以回收先前已被释放的块而不首先将它们归零。这意味着即使来自OS的内存最初被清零,malloc的内存也不需要。

The conventional wisdom that the values are nondeterministic is probably a good one because it forces you to realize that any memory you get back might have garbage data in it that could crash your program. That said, you should not assume that the values are truly random. You should, however, realize that the values handed back are not magically going to be what you want. You are responsible for setting them up correctly. Assuming the values are truly random is a Really Bad Idea, since there is nothing at all to suggest that they would be.

值不确定的传统观点可能是一个好的,因为它迫使你意识到你得到的任何内存可能都有垃圾数据,可能会导致你的程序崩溃。也就是说,你不应该假设这些值是真正随机的。但是,你应该意识到,交还的价值并不是你想要的。您有责任正确设置它们。假设值是真正随机的是一个真正糟糕的想法,因为没有任何东西可以表明它们会是。

If you want memory that is guaranteed to be zeroed out, use calloc instead.

如果您希望保证将内存清零,请改用calloc。

Hope this helps!

希望这可以帮助!

#4


3  

malloc is defined on many systems that can be programmed in C/C++, including many non-UNIX systems, and many systems that lack operating system altogether. Requiring malloc to zero out the memory goes against C's philosophy of saving CPU as much as possible.

malloc在许多可以用C / C ++编程的系统上定义,包括许多非UNIX系统,以及许多完全缺乏操作系统的系统。要求malloc将内存清零会违背C的尽可能节省CPU的理念。

The standard provides a zeroing cal calloc that can be used if you need to zero out the memory. But in cases when you are planning to initialize the memory yourself as soon as you get it, the CPU cycles spent making sure the block is zeroed out are a waste; C standard aims to avoid this waste as much as possible, often at the expense of predictability.

该标准提供了一个归零cal calloc,可以在需要将内存清零时使用。但是,如果您计划在获得内存时自己初始化内存,那么确保将块清零的CPU周期是浪费; C标准旨在尽可能地避免这种浪费,通常以牺牲可预测性为代价。

#5


3  

Memory returned by mallocis not zeroed (or rather, is not guaranteed to be zeroed) because it does not need to. There is no security risk in reusing uninitialized memory pulled from your own process' address space or page pool. You already know it's there, and you already know the contents. There is also no issue with the contents in a practical sense, because you're going to overwrite it anyway.

mallocis返回的内存不归零(或者更确切地说,不保证归零),因为它不需要。重用从您自己的进程的地址空间或页面池中提取的未初始化内存不存在安全风险。你已经知道了,你已经知道了内容。实际意义上的内容也没有问题,因为无论如何你都要覆盖它。

Incidentially, the memory returned by malloc is zeroed upon first allocation, because an operating system kernel cannot afford the risk of giving one process data that another process owned previously. Therefore, when the OS faults in a new page, it only ever provides one that has been zeroed. However, this is totally unrelated to malloc.

在整体上,malloc返回的内存在第一次分配时归零,因为操作系统内核不能承担给出之前另一个进程拥有的一个进程数据的风险。因此,当操作系统在新页面中出现故障时,它只提供已归零的操作系统。但是,这与malloc完全无关。

(Slightly off-topic: The Debian security thing you mentioned had a few more implications than using uninitialized memory for randomness. A packager who was not familiar with the inner workings of the code and did not know the precise implications patched out a couple of places that Valgrind had reported, presumably with good intent but to desastrous effect. Among these was the "random from uninitilized memory", but it was by far not the most severe one.)

(稍微偏离主题:你提到的Debian安全事件比使用未初始化的内存随机性更具有一些含义。一个不熟悉代码内部工作原理的打包者,并不知道修补了几个地方的确切含义Valgrind曾经报道过,可能是出于好意,但却产生了灾难性的后果。其中包括“从未经过无记忆的记忆中随意”,但到目前为止并不是最严重的记忆。)

#6


1  

Even in "normal" single-threaded programs, memory is freed and reallocated many times. Malloc will return to you memory that you had used before.

即使在“普通”单线程程序中,内存也会被释放并重新分配多次。 Malloc会回复你以前用过的记忆。

#7


1  

Even single-threaded code may do malloc then free then malloc and get back previously used, non-zero memory.

甚至单线程代码也可以执行malloc然后释放malloc并恢复以前使用的非零内存。

#8


1  

There is no guarantee that brk/sbrk return 0ed-out data; this is an implementation detail. It is generally a good idea for an OS to do that to reduce the possibility that sensitive information from one process finds its way into another process, but nothing in the specification says that it will be the case.

无法保证brk / sbrk返回0输出数据;这是一个实现细节。操作系统通常最好这样做,以减少来自一个进程的敏感信息进入另一个进程的可能性,但规范中没有任何内容表明会出现这种情况。

Also, the fact that malloc is implemented on top of brk/sbrk is also implementation-dependent, and can even vary based on the size of the allocation; for example, large allocations on Linux have traditionally used mmap on /dev/zero instead.

此外,malloc在brk / sbrk之上实现的事实也依赖于实现,甚至可以根据分配的大小而变化;例如,Linux上的大型分配传统上使用/ dev / zero上的mmap。

Basically, you can neither rely on malloc()ed regions containing garbage nor on it being all-0, and no program should assume one way or the other about it.

基本上,你既不能依赖包含垃圾的malloc()ed区域,也不能依赖于全0,并且任何程序都不应该采用这种方式或其他方式。

#9


0  

The simplest way I can think of putting the answer is like this:

我能想出答案的最简单方法是这样的:

If I am looking for wall space to paint a mural, I don't care whether it is white or covered with old graffiti, since I'm going to prime it and paint over it. I only care whether I have enough square footage to accommodate the picture, and I care that I'm not painting over an area that belongs to someone else.

如果我正在寻找用于绘制壁画的墙面空间,我不在乎它是白色的还是覆盖着旧的涂鸦,因为我要将它涂上并涂上它。我只关心我是否有足够的平方英尺来容纳图片,而且我在乎我不是在一个属于别人的区域上画画。

That is how malloc thinks. Zeroing memory every time a process ends would be wasted computational effort. It would be like re-priming the wall every time you finish painting.

这就是malloc的想法。每次进程结束时将内存归零都会浪费计算量。这就像每次完成绘画时重新启动墙壁。

#10


-1  

There is an whole ecosystem of programs living inside a computer memmory and you cannot control the order in which mallocs and frees are happening.

在计算机内存中存在整个程序生态系统,您无法控制malloc和frees发生的顺序。

Imagine that the first time you run your application and malloc() something, it gives you an address with some garbage. Then your program shuts down, your OS marks that area as free. Another program takes it with another malloc(), writes a lot of stuff and then leaves. You run your program again, it might happen that malloc() gives you the same address, but now there's different garbage there, that the previous program might have written.

想象一下,第一次运行应用程序和malloc()时,它会给你一个带有垃圾的地址。然后你的程序关闭,你的操作系统将该区域标记为空闲。另一个程序使用另一个malloc(),写了很多东西,然后离开。你再次运行你的程序,可能会发生malloc()给你相同的地址,但现在有不同的垃圾,以前的程序可能写的。

I don't actually know the implementation of malloc() in any system and I don't know if it implements any kind of security measure (like randomizing the returned address), but I don't think so.

我实际上并不知道任何系统中malloc()的实现,我不知道它是否实现了任何类型的安全措施(比如随机化返回的地址),但我不这么认为。

It is very deterministic.

这是非常确定的。

#1


2  

I think that the assumption that it is non-deterministic is plain wrong, particularly as you ask for a non-threaded context. (In a threaded context due to scheduling alea you could have some non-determinism).

我认为它是非确定性的假设是完全错误的,特别是当你要求一个非线程的上下文时。 (由于调度问题,在线程上下文中,您可能会有一些非确定性)。

Just try it out. Create a sequential, deterministic application that

试一试吧。创建一个顺序的,确定性的应用程序

  • does a whole bunch of allocations
  • 做了一大堆分配

  • fills the memory with some pattern, eg fill it with the value of a counter
  • 用某种模式填充内存,例如用计数器的值填充它

  • free every second of these allocations
  • 释放每一分配

  • newly allocate the same amount
  • 新分配相同的金额

  • run through these new allocations and register the value of the first byte in a file (as textual numbers one per line)
  • 运行这些新分配并在文件中注册第一个字节的值(每行一个文本数字)

run this program twice and register the result in two different files. My idea is that these files will be identical.

运行此程序两次,并将结果注册到两个不同的文件中。我的想法是这些文件是相同的。

#2


11  

Malloc does not guarantee unpredictability... it just doesn't guarantee predictability.

Malloc不保证不可预测性......它只是不能保证可预测性。

E.g. Consider that

例如。考虑一下

 return 0;

Is a valid implementation of malloc.

是malloc的有效实现。

#3


4  

The initial values of memory returned by malloc are unspecified, which means that the specifications of the C and C++ languages put no restrictions on what values can be handed back. This makes the language easier to implement on a variety of platforms. While it might be true that in Linux malloc is implemented with brk and sbrk and the memory should be zeroed (I'm not even sure that this is necessarily true, by the way), on other platforms, perhaps an embedded platform, there's no reason that this would have to be the case. For example, an embedded device might not want to zero the memory, since doing so costs CPU cycles and thus power and time. Also, in the interest of efficiency, for example, the memory allocator could recycle blocks that had previously been freed without zeroing them out first. This means that even if the memory from the OS is initially zeroed out, the memory from malloc needn't be.

malloc返回的内存的初始值未指定,这意味着C和C ++语言的规范对可以返回的值没有限制。这使得该语言更易于在各种平台上实现。虽然可能确实在Linux中malloc是用brk和sbrk实现的,并且内存应该归零(我甚至不确定这是否一定是真的,顺便说一下),在其他平台上,也许是嵌入式平台,没有原因是必须如此。例如,嵌入式设备可能不希望将内存归零,因为这样做会花费CPU周期,从而耗费功率和时间。此外,为了提高效率,例如,内存分配器可以回收先前已被释放的块而不首先将它们归零。这意味着即使来自OS的内存最初被清零,malloc的内存也不需要。

The conventional wisdom that the values are nondeterministic is probably a good one because it forces you to realize that any memory you get back might have garbage data in it that could crash your program. That said, you should not assume that the values are truly random. You should, however, realize that the values handed back are not magically going to be what you want. You are responsible for setting them up correctly. Assuming the values are truly random is a Really Bad Idea, since there is nothing at all to suggest that they would be.

值不确定的传统观点可能是一个好的,因为它迫使你意识到你得到的任何内存可能都有垃圾数据,可能会导致你的程序崩溃。也就是说,你不应该假设这些值是真正随机的。但是,你应该意识到,交还的价值并不是你想要的。您有责任正确设置它们。假设值是真正随机的是一个真正糟糕的想法,因为没有任何东西可以表明它们会是。

If you want memory that is guaranteed to be zeroed out, use calloc instead.

如果您希望保证将内存清零,请改用calloc。

Hope this helps!

希望这可以帮助!

#4


3  

malloc is defined on many systems that can be programmed in C/C++, including many non-UNIX systems, and many systems that lack operating system altogether. Requiring malloc to zero out the memory goes against C's philosophy of saving CPU as much as possible.

malloc在许多可以用C / C ++编程的系统上定义,包括许多非UNIX系统,以及许多完全缺乏操作系统的系统。要求malloc将内存清零会违背C的尽可能节省CPU的理念。

The standard provides a zeroing cal calloc that can be used if you need to zero out the memory. But in cases when you are planning to initialize the memory yourself as soon as you get it, the CPU cycles spent making sure the block is zeroed out are a waste; C standard aims to avoid this waste as much as possible, often at the expense of predictability.

该标准提供了一个归零cal calloc,可以在需要将内存清零时使用。但是,如果您计划在获得内存时自己初始化内存,那么确保将块清零的CPU周期是浪费; C标准旨在尽可能地避免这种浪费,通常以牺牲可预测性为代价。

#5


3  

Memory returned by mallocis not zeroed (or rather, is not guaranteed to be zeroed) because it does not need to. There is no security risk in reusing uninitialized memory pulled from your own process' address space or page pool. You already know it's there, and you already know the contents. There is also no issue with the contents in a practical sense, because you're going to overwrite it anyway.

mallocis返回的内存不归零(或者更确切地说,不保证归零),因为它不需要。重用从您自己的进程的地址空间或页面池中提取的未初始化内存不存在安全风险。你已经知道了,你已经知道了内容。实际意义上的内容也没有问题,因为无论如何你都要覆盖它。

Incidentially, the memory returned by malloc is zeroed upon first allocation, because an operating system kernel cannot afford the risk of giving one process data that another process owned previously. Therefore, when the OS faults in a new page, it only ever provides one that has been zeroed. However, this is totally unrelated to malloc.

在整体上,malloc返回的内存在第一次分配时归零,因为操作系统内核不能承担给出之前另一个进程拥有的一个进程数据的风险。因此,当操作系统在新页面中出现故障时,它只提供已归零的操作系统。但是,这与malloc完全无关。

(Slightly off-topic: The Debian security thing you mentioned had a few more implications than using uninitialized memory for randomness. A packager who was not familiar with the inner workings of the code and did not know the precise implications patched out a couple of places that Valgrind had reported, presumably with good intent but to desastrous effect. Among these was the "random from uninitilized memory", but it was by far not the most severe one.)

(稍微偏离主题:你提到的Debian安全事件比使用未初始化的内存随机性更具有一些含义。一个不熟悉代码内部工作原理的打包者,并不知道修补了几个地方的确切含义Valgrind曾经报道过,可能是出于好意,但却产生了灾难性的后果。其中包括“从未经过无记忆的记忆中随意”,但到目前为止并不是最严重的记忆。)

#6


1  

Even in "normal" single-threaded programs, memory is freed and reallocated many times. Malloc will return to you memory that you had used before.

即使在“普通”单线程程序中,内存也会被释放并重新分配多次。 Malloc会回复你以前用过的记忆。

#7


1  

Even single-threaded code may do malloc then free then malloc and get back previously used, non-zero memory.

甚至单线程代码也可以执行malloc然后释放malloc并恢复以前使用的非零内存。

#8


1  

There is no guarantee that brk/sbrk return 0ed-out data; this is an implementation detail. It is generally a good idea for an OS to do that to reduce the possibility that sensitive information from one process finds its way into another process, but nothing in the specification says that it will be the case.

无法保证brk / sbrk返回0输出数据;这是一个实现细节。操作系统通常最好这样做,以减少来自一个进程的敏感信息进入另一个进程的可能性,但规范中没有任何内容表明会出现这种情况。

Also, the fact that malloc is implemented on top of brk/sbrk is also implementation-dependent, and can even vary based on the size of the allocation; for example, large allocations on Linux have traditionally used mmap on /dev/zero instead.

此外,malloc在brk / sbrk之上实现的事实也依赖于实现,甚至可以根据分配的大小而变化;例如,Linux上的大型分配传统上使用/ dev / zero上的mmap。

Basically, you can neither rely on malloc()ed regions containing garbage nor on it being all-0, and no program should assume one way or the other about it.

基本上,你既不能依赖包含垃圾的malloc()ed区域,也不能依赖于全0,并且任何程序都不应该采用这种方式或其他方式。

#9


0  

The simplest way I can think of putting the answer is like this:

我能想出答案的最简单方法是这样的:

If I am looking for wall space to paint a mural, I don't care whether it is white or covered with old graffiti, since I'm going to prime it and paint over it. I only care whether I have enough square footage to accommodate the picture, and I care that I'm not painting over an area that belongs to someone else.

如果我正在寻找用于绘制壁画的墙面空间,我不在乎它是白色的还是覆盖着旧的涂鸦,因为我要将它涂上并涂上它。我只关心我是否有足够的平方英尺来容纳图片,而且我在乎我不是在一个属于别人的区域上画画。

That is how malloc thinks. Zeroing memory every time a process ends would be wasted computational effort. It would be like re-priming the wall every time you finish painting.

这就是malloc的想法。每次进程结束时将内存归零都会浪费计算量。这就像每次完成绘画时重新启动墙壁。

#10


-1  

There is an whole ecosystem of programs living inside a computer memmory and you cannot control the order in which mallocs and frees are happening.

在计算机内存中存在整个程序生态系统,您无法控制malloc和frees发生的顺序。

Imagine that the first time you run your application and malloc() something, it gives you an address with some garbage. Then your program shuts down, your OS marks that area as free. Another program takes it with another malloc(), writes a lot of stuff and then leaves. You run your program again, it might happen that malloc() gives you the same address, but now there's different garbage there, that the previous program might have written.

想象一下,第一次运行应用程序和malloc()时,它会给你一个带有垃圾的地址。然后你的程序关闭,你的操作系统将该区域标记为空闲。另一个程序使用另一个malloc(),写了很多东西,然后离开。你再次运行你的程序,可能会发生malloc()给你相同的地址,但现在有不同的垃圾,以前的程序可能写的。

I don't actually know the implementation of malloc() in any system and I don't know if it implements any kind of security measure (like randomizing the returned address), but I don't think so.

我实际上并不知道任何系统中malloc()的实现,我不知道它是否实现了任何类型的安全措施(比如随机化返回的地址),但我不这么认为。

It is very deterministic.

这是非常确定的。