为什么缓存局部性会影响数组的性能?

时间:2021-12-16 21:29:21

In the following blog there is a statement about the advantage of arrays over linked lists:

在下面的博客中有一篇关于数组优于链表的声明:

Arrays have better cache locality that can make a pretty big difference in performance.

数组具有更好的缓存局部性,可以在性能上产生很大的差异。

What does that mean? I don't understand how cache locality can provide a huge performance benefit.

这是什么意思?我不理解缓存局部性如何能够提供巨大的性能收益。

2 个解决方案

#1


63  

See my answer about spatial and temporal locality.

看我关于时空局部性的回答。

In particular, arrays are contiguous memory blocks, so large chunks of them will be loaded into the cache upon first access. This makes it comparatively quick to access future elements of the array. Linked lists on the other hand aren't necessarily in contiguous blocks of memory, and could lead to more cache misses, which increases the time it takes to access them.

特别地,数组是连续的内存块,所以在第一次访问时,它们中的很大一部分将被加载到缓存中。这使它能够相对快速地访问数组的未来元素。另一方面,链表并不一定在连续的内存块中,并且可能导致更多的缓存丢失,这会增加访问它们的时间。

Consider the following possible memory layouts for an array data and linked list l_data of large structs

考虑以下可能的大结构数组数据和链表l_data的内存布局

Address      Contents       | Address      Contents
ffff 0000    data[0]        | ffff 1000    l_data
ffff 0040    data[1]        |   ....
ffff 0080    data[2]        | ffff 3460    l_data->next
ffff 00c0    data[3]        |   ....
ffff 0100    data[4]        | ffff 8dc0    l_data->next->next
                            | ffff 8e00    l_data->next->next->next
                            |   ....
                            | ffff 8f00    l_data->next->next->next->next

If we wanted to loop through this array, the first access to ffff 0000 would require us to go to memory to retrieve (a very slow operation in CPU cycles). However, after the first access the rest of the array would be in the cache, and subsequent accesses would be much quicker. With the linked list, the first access to ffff 1000 would also require us to go to memory. Unfortunately, the processor will cache the memory directly surrounding this location, say all the way up to ffff 2000. As you can see, this doesn't actually capture any of the other elements of the list, which means that when we go to access l_data->next, we will again have to go to memory.

如果我们想要对这个数组进行循环,第一次访问ffff 0000将需要访问内存来检索(在CPU周期中这是一个非常缓慢的操作)。然而,在第一次访问之后,数组的其余部分将在缓存中,随后的访问将会快得多。对于链表,第一次访问ffff 1000也需要我们进入内存。不幸的是,处理器将缓存直接围绕这个位置的内存,一直到ffff 2000。如您所见,这实际上并没有捕获列表中的任何其他元素,这意味着当我们访问l_data->时,我们将再次访问内存。

#2


5  

Typically, when using an array you access items that are near each other. This is especially true when accessing an array sequentially.

通常,当使用数组时,您访问的条目是彼此相邻的。当按顺序访问数组时尤其如此。

When you access memory, a chunks of it are cached at various levels. Cache locality refers to the likelihood of successive operations being in the cache and thus being faster. In an array, you maximize the chances of sequential element access being in the cache.

当您访问内存时,其中的一部分会缓存到不同的级别。缓存局部性指的是连续操作位于缓存中的可能性,因此速度更快。在数组中,将顺序元素访问的机会最大化到缓存中。

With lists, by counter-example, there's no guarantee that items which appear sequentially in the list are actually arranged near eachother in memory. This means fewer cache hits, and degraded performance.

对于列表,通过反例,不能保证在列表中顺序出现的项实际上是在内存中相邻排列的。这意味着缓存命中更少,性能更差。

#1


63  

See my answer about spatial and temporal locality.

看我关于时空局部性的回答。

In particular, arrays are contiguous memory blocks, so large chunks of them will be loaded into the cache upon first access. This makes it comparatively quick to access future elements of the array. Linked lists on the other hand aren't necessarily in contiguous blocks of memory, and could lead to more cache misses, which increases the time it takes to access them.

特别地,数组是连续的内存块,所以在第一次访问时,它们中的很大一部分将被加载到缓存中。这使它能够相对快速地访问数组的未来元素。另一方面,链表并不一定在连续的内存块中,并且可能导致更多的缓存丢失,这会增加访问它们的时间。

Consider the following possible memory layouts for an array data and linked list l_data of large structs

考虑以下可能的大结构数组数据和链表l_data的内存布局

Address      Contents       | Address      Contents
ffff 0000    data[0]        | ffff 1000    l_data
ffff 0040    data[1]        |   ....
ffff 0080    data[2]        | ffff 3460    l_data->next
ffff 00c0    data[3]        |   ....
ffff 0100    data[4]        | ffff 8dc0    l_data->next->next
                            | ffff 8e00    l_data->next->next->next
                            |   ....
                            | ffff 8f00    l_data->next->next->next->next

If we wanted to loop through this array, the first access to ffff 0000 would require us to go to memory to retrieve (a very slow operation in CPU cycles). However, after the first access the rest of the array would be in the cache, and subsequent accesses would be much quicker. With the linked list, the first access to ffff 1000 would also require us to go to memory. Unfortunately, the processor will cache the memory directly surrounding this location, say all the way up to ffff 2000. As you can see, this doesn't actually capture any of the other elements of the list, which means that when we go to access l_data->next, we will again have to go to memory.

如果我们想要对这个数组进行循环,第一次访问ffff 0000将需要访问内存来检索(在CPU周期中这是一个非常缓慢的操作)。然而,在第一次访问之后,数组的其余部分将在缓存中,随后的访问将会快得多。对于链表,第一次访问ffff 1000也需要我们进入内存。不幸的是,处理器将缓存直接围绕这个位置的内存,一直到ffff 2000。如您所见,这实际上并没有捕获列表中的任何其他元素,这意味着当我们访问l_data->时,我们将再次访问内存。

#2


5  

Typically, when using an array you access items that are near each other. This is especially true when accessing an array sequentially.

通常,当使用数组时,您访问的条目是彼此相邻的。当按顺序访问数组时尤其如此。

When you access memory, a chunks of it are cached at various levels. Cache locality refers to the likelihood of successive operations being in the cache and thus being faster. In an array, you maximize the chances of sequential element access being in the cache.

当您访问内存时,其中的一部分会缓存到不同的级别。缓存局部性指的是连续操作位于缓存中的可能性,因此速度更快。在数组中,将顺序元素访问的机会最大化到缓存中。

With lists, by counter-example, there's no guarantee that items which appear sequentially in the list are actually arranged near eachother in memory. This means fewer cache hits, and degraded performance.

对于列表,通过反例,不能保证在列表中顺序出现的项实际上是在内存中相邻排列的。这意味着缓存命中更少,性能更差。