一维或二维数组,哪个更快?

时间:2021-01-09 21:37:54

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?

我需要表示一个2D字段(坐标轴x, y),我面临一个问题:我应该使用一维数组还是2D数组?

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..

我可以想象,对于一维数组(y + x*n)的重新计算索引可能比使用2D数组(x, y)要慢,但是我可以想象1D可以在CPU缓存中。

I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.

我做了一些谷歌搜索,但只找到了关于静态数组的页面(并且声明1D和2D基本上是相同的)。但是我的数组必须是动态的。

So, what's

所以,是什么

  1. faster,
  2. 更快,
  3. smaller (RAM)
  4. 小(RAM)

dynamic 1D arrays or dynamic 2D arrays?

动态一维数组或动态二维数组?

Thanks :)

谢谢:)

7 个解决方案

#1


148  

tl;dr : You should probably use a one-dimensional approach.

Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.

注意:在比较动态的一维或动态2d存储模式时,不能在没有填充书籍的情况下对性能进行深入挖掘,因为代码的性能依赖于大量的参数。如果可能的话。

1. What's faster?

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

对于密集矩阵,1D方法可能会更快,因为它提供更好的内存位置和更少的分配和分配开销。

2. What's smaller?

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

动态1d比2D方法消耗更少的内存。后者还需要更多的拨款。

Remarks

I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.

我给出了一个很长的答案,有几个原因,但我想先谈谈你的假设。

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)

我可以想象,对于一维数组(y + x*n)的重新计算索引可能比使用2D数组(x, y)要慢

Let's compare these two functions:

我们来比较一下这两个函数

int get_2d (int **p, int r, int c) { return p[r][c]; }
int get_1d (int *p, int r, int c)  { return p[c + C*r]; }

The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:

Visual Studio 2015 RC为这些功能(优化打开)生成的(非内联的)程序集是:

?get_1d@@YAHPAHII@Z PROC
push    ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

?get_2d@@YAHPAPAHII@Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

The difference is mov (2d) vs. lea (1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

区别是mov (2d)和lea (1d)。前者有3个周期的潜伏期,而后者的最大吞吐量为2个周期,而后者的延迟为2个周期,最大吞吐量为3个周期。(根据指示表- Agner Fog由于差异较小,我认为不应该出现因指数重新计算而产生的巨大性能差异。我希望它不太可能识别出这种差异,从而成为任何程序的瓶颈。

This brings us to the next (and more interesting) point:

这就引出了下一个(更有趣的)点:

... but I could image that 1D could be in CPU cache ...

…但是我可以想象,1D可能在CPU缓存中…

True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.

没错,但是2d也可能出现在CPU缓存中。看到缺点:内存位置解释为什么1d仍然更好。

The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.

Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.

注:这是关于动态数组/分配方案[malloc/new/vector等]。一个静态的2d数组是一个连续的内存块,因此不受我将要在这里出现的缺点的影响。

The Problem

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

为了能够理解为什么动态数组或向量的动态数组很可能不是选择的数据存储模式,您需要理解这种结构的内存布局。

Example case using pointer to pointer syntax

int main (void)
{
    // allocate memory for 4x4 integers; quick & dirty
    int ** p = new int*[4];
    for (size_t i=0; i<4; ++i) p[i] = new int[4]; 

    // do some stuff here, using p[x][y] 

    // deallocate memory
    for (size_t i=0; i<4; ++i) delete[] p[i];
    delete[] p;
}

The downsides

Memory locality

For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.

对于这个“矩阵”,你分配一个4个指针和4个整数块。所有的分配都是不相关的,因此可以产生任意的内存位置。

The following image will give you an idea of how the memory may look like.

下面的图片会让你了解记忆是怎样的。

For the real 2d case:

对于真实的2d情况:

  • The violet square is the memory position occupied by p itself.
  • 紫色方块是p本身占用的内存位置。
  • The green squares assemble the memory region p points to (4 x int*).
  • 绿色方块将记忆区域p点集合到(4 x int*)。
  • The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region
  • 4个相邻的蓝色方格的4个区域是绿色区域的每个int*所指向的区域。

For the 2d mapped on 1d case:

对于一维情况下的二维映射:

  • The green square is the only required pointer int *
  • 绿色方块是唯一需要的指针int *。
  • The blue squares ensemble the memory region for all matrix elements (16 x int).
  • 蓝色方块集成所有矩阵元素的内存区域(16 x int)。

一维或二维数组,哪个更快?

This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.

这意味着(在使用左侧布局时),您可能会观察到比连续存储模式(如右图所示)更糟糕的性能,比如缓存。

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

假设一个高速缓存线是“一次传输到缓存中的数据量”,让我们假设一个程序访问一个接一个的矩阵。

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

如果您有一个正确对齐的4*4矩阵的32位值,一个带有64字节缓存线(典型值)的处理器能够“一次”数据(4*4*4 = 64字节)。如果您开始处理,数据还没有在缓存中,那么您将面对一个缓存缺失,数据将从主内存中获取。这个负载可以一次获取整个矩阵,因为它适合于缓存线,如果且仅当它是连续存储的(并且正确对齐)。在处理这些数据时,可能不会再有任何遗漏。

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

如果是一个动态的“真实的二维”系统,每个行/列的位置都不相关,处理器就需要分别加载每个内存位置。即使只需要64个字节,在最坏的情况下,为4个不相关的内存位置加载4个缓存线——实际上是传输256个字节,浪费75%的吞吐量带宽。如果您使用2d模式处理数据,您将再次(如果没有缓存)在第一个元素上面对一个缓存缺失。但是现在,只有第一行/colum将在主内存中的第一个加载之后在缓存中,因为所有其他行都位于内存中的其他地方,而不是与第一个行相邻。当您到达一个新的行/列时,将再次出现缓存遗漏,并执行主内存的下一个加载。

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.

长话短说:2d模式有更高的缓存缺失的可能性,因为数据的局部性,1d方案提供了更好的性能潜力。

Frequent Allocation / Deallocation

  • As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
  • 多达N + 1(4 + 1 = 5)分配(使用新的、malloc、分配器::分配或其他)是创建所需的NxM(4 4)矩阵的必要条件。
  • The same number of proper, respective deallocation operations must be applied as well.
  • 同样数量的适当的,相应的deallocation操作也必须被应用。

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

因此,与单一分配方案相比,创建/复制这样的矩阵成本更高。

This is getting even worse with a growing number of rows.

随着行数的增加,情况变得更糟。

Memory consumption overhead

I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)

我将asumme大小为32位的int和32位的指针。(注:系统依赖。)

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

让我们记住:我们想要存储一个4×4整数矩阵这意味着64字节。

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume

对于一个NxM矩阵,存储了我们所使用的指针对指针的方案。

  • N*M*sizeof(int) [the actual blue data] +
  • N*M*sizeof(int)[实际的蓝色数据]+。
  • N*sizeof(int*) [the green pointers] +
  • N*sizeof(int*)[绿色指针]+。
  • sizeof(int**) [the violet variable p] bytes.
  • sizeof(int**)[紫色变量p]字节。

That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.

在本例中,4*4*4 + 4*4 + 4 = 84字节,使用std时更糟糕::vector <:vector> >。它需要N * M * sizeof(int) + N * sizeof(vector ) + sizeof(向量 >)字节,即4*4*4 + 4*16 + 16 = 144字节,64字节的整数为4 x 4 int。

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)

此外,根据使用的分配器,每个单独的分配可能(而且很可能会)有另外16个字节的内存开销。(一些“Infobytes”将分配的字节数存储为适当的deallocation目的。)

This means the worst case is:

这意味着最坏的情况是:

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

N *(16 + M * sizeof(int))+ 16 + N * sizeof(int *)+ sizeof(int * *)= 4 *(16 + 4 * 4)+ 16 + 4 * 4 + 4 = 164字节!_Overhead:156% _

The share of the overhead will reduce as the size of the matrix grows but will still be present.

当矩阵的大小增加时,开销的份额会减小,但仍然会存在。

Risk of memory leaks

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

这些分配需要一个适当的异常处理,以避免内存泄漏,如果其中一个分配失败!您将需要跟踪已分配的内存块,并且在释放内存时不能忘记它们。

If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.

如果新运行的内存和下一行不能被分配(特别是当这个矩阵非常大的时候),std::bad_alloc是由new抛出的。

Example:

例子:

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.

在上面提到的新/删除示例中,如果我们希望在bad_alloc异常情况下避免泄漏,我们将面对更多的代码。

  // allocate memory for 4x4 integers; quick & dirty
  size_t const N = 4;
  // we don't need try for this allocation
  // if it fails there is no leak
  int ** p = new int*[N];
  size_t allocs(0U);
  try 
  { // try block doing further allocations
    for (size_t i=0; i<N; ++i) 
    {
      p[i] = new int[4]; // allocate
      ++allocs; // advance counter if no exception occured
    }
  }
  catch (std::bad_alloc & be)
  { // if an exception occurs we need to free out memory
    for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
    delete[] p; // free p
    throw; // rethrow bad_alloc
  }
  /*
     do some stuff here, using p[x][y] 
  */
  // deallocate memory accoding to the number of allocations
  for (size_t i=0; i<allocs; ++i) delete[] p[i];
  delete[] p;

Summary

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

有“真正的2 d”的情况下内存布局符合和意义(即如果每行的列数不是常量)但在最简单和常见的2 d数据存储情况下他们只是膨胀您的代码的复杂性,降低程序的性能和内存效率。

Alternative

You should use a contiguous block of memory and map your rows onto that block.

您应该使用一个连续的内存块,并将您的行映射到该块上。

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

这样做的“c++方法”很可能是编写一个类来管理您的内存,同时考虑一些重要的事情。

Example

To provide an idea of how such a class may look like, here's a simple example with some basic features:

为了提供这样一个类的外观,这里有一个简单的例子,有一些基本的特性:

  • 2d-size-constructible
  • 2 d-size-constructible
  • 2d-resizable
  • 2 d-resizable
  • operator(size_t, size_t) for 2d- row major element access
  • 操作符(size_t, size_t),用于2d行主要元素访问。
  • at(size_t, size_t) for checked 2d-row major element access
  • 在(size_t, size_t)中检查第2行主要元素访问。
  • Fulfills Concept requirements for Container
  • 满足容器的概念要求。

Source:

来源:

#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>

namespace matrices
{

  template<class T>
  class simple
  {
  public:
    // misc types
    using data_type  = std::vector<T>;
    using value_type = typename std::vector<T>::value_type;
    using size_type  = typename std::vector<T>::size_type;
    // ref
    using reference       = typename std::vector<T>::reference;
    using const_reference = typename std::vector<T>::const_reference;
    // iter
    using iterator       = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    // reverse iter
    using reverse_iterator       = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;

    // empty construction
    simple() = default;

    // default-insert rows*cols values
    simple(size_type rows, size_type cols)
      : m_rows(rows), m_cols(cols), m_data(rows*cols)
    {}

    // copy initialized matrix rows*cols
    simple(size_type rows, size_type cols, const_reference val)
      : m_rows(rows), m_cols(cols), m_data(rows*cols, val)
    {}

    // 1d-iterators

    iterator begin() { return m_data.begin(); }
    iterator end() { return m_data.end(); }
    const_iterator begin() const { return m_data.begin(); }
    const_iterator end() const { return m_data.end(); }
    const_iterator cbegin() const { return m_data.cbegin(); }
    const_iterator cend() const { return m_data.cend(); }
    reverse_iterator rbegin() { return m_data.rbegin(); }
    reverse_iterator rend() { return m_data.rend(); }
    const_reverse_iterator rbegin() const { return m_data.rbegin(); }
    const_reverse_iterator rend() const { return m_data.rend(); }
    const_reverse_iterator crbegin() const { return m_data.crbegin(); }
    const_reverse_iterator crend() const { return m_data.crend(); }

    // element access (row major indexation)
    reference operator() (size_type const row,
      size_type const column)
    {
      return m_data[m_cols*row + column];
    }
    const_reference operator() (size_type const row,
      size_type const column) const
    {
      return m_data[m_cols*row + column];
    }
    reference at() (size_type const row, size_type const column)
    {
      return m_data.at(m_cols*row + column);
    }
    const_reference at() (size_type const row, size_type const column) const
    {
      return m_data.at(m_cols*row + column);
    }

    // resizing
    void resize(size_type new_rows, size_type new_cols)
    {
      // new matrix new_rows times new_cols
      simple tmp(new_rows, new_cols);
      // select smaller row and col size
      auto mc = std::min(m_cols, new_cols);
      auto mr = std::min(m_rows, new_rows);
      for (size_type i(0U); i < mr; ++i)
      {
        // iterators to begin of rows
        auto row = begin() + i*m_cols;
        auto tmp_row = tmp.begin() + i*new_cols;
        // move mc elements to tmp
        std::move(row, row + mc, tmp_row);
      }
      // move assignment to this
      *this = std::move(tmp);
    }

    // size and capacity
    size_type size() const { return m_data.size(); }
    size_type max_size() const { return m_data.max_size(); }
    bool empty() const { return m_data.empty(); }
    // dimensionality
    size_type rows() const { return m_rows; }
    size_type cols() const { return m_cols; }
    // data swapping
    void swap(simple &rhs)
    {
      using std::swap;
      m_data.swap(rhs.m_data);
      swap(m_rows, rhs.m_rows);
      swap(m_cols, rhs.m_cols);
    }
  private:
    // content
    size_type m_rows{ 0u };
    size_type m_cols{ 0u };
    data_type m_data{};
  };
  template<class T>
  void swap(simple<T> & lhs, simple<T> & rhs)
  {
    lhs.swap(rhs);
  }
  template<class T>
  bool operator== (simple<T> const &a, simple<T> const &b)
  {
    if (a.rows() != b.rows() || a.cols() != b.cols())
    {
      return false;
    }
    return std::equal(a.begin(), a.end(), b.begin(), b.end());
  }
  template<class T>
  bool operator!= (simple<T> const &a, simple<T> const &b)
  {
    return !(a == b);
  }

}

Note several things here:

注意几件事情:

  • T needs to fulfill the requirements of the used std::vector member functions
  • T需要满足使用的std::向量成员函数的要求。
  • operator() doesn't do any "of of range" checks
  • 操作符()不会执行任何“范围”检查。
  • No need to manage data on your own
  • 无需自己管理数据。
  • No destructor, copy constructor or assignment operators required
  • 不需要析构函数、复制构造函数或赋值操作符。

So you don't have to bother about proper memory handling for each application but only once for the class you write.

因此,您不必为每个应用程序的正确内存处理操心,但只需要为您编写的类进行一次处理。

Restrictions

There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if

在某些情况下,动态的“真实”二维结构是有利的。这是举个例子。

  • the matrix is very large and sparse (if any of the rows do not even need to be allocated but can be handled using a nullptr) or if
  • 该矩阵非常大且非常稀疏(如果有任何行甚至不需要分配,但可以使用nullptr来处理)或if。
  • the rows do not have the same number of columns (that is if you don't have a matrix at all but another two-dimensional construct).
  • 行没有相同数量的列(如果您没有一个矩阵,而是另一个二维构造)。

#2


15  

Unless you are talking about static arrays, 1D is faster.

除非你讨论的是静态数组,否则1D会更快。

Here’s the memory layout of a 1D array (std::vector<T>):

这是一维数组的内存布局(std::vector ):

+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |
+---+---+---+---+---+---+---+---+---+

And here’s the same for a dynamic 2D array (std::vector<std::vector<T>>):

对于一个动态2D数组(std::vector <:vector> >):

+---+---+---+
| * | * | * |
+-|-+-|-+-|-+
  |   |   V
  |   | +---+---+---+
  |   | |   |   |   |
  |   | +---+---+---+
  |   V
  | +---+---+---+
  | |   |   |   |
  | +---+---+---+
  V
+---+---+---+
|   |   |   |
+---+---+---+

Clearly the 2D case loses the cache locality and uses more memory. It also introduces an extra indirection (and thus an extra pointer to follow) but the first array has the overhead of calculating the indices so these even out more or less.

显然,2D情况会丢失缓存位置并使用更多内存。它还引入了一个额外的间接(因此是一个额外的指针),但是第一个数组有计算索引的开销,所以这些甚至多或少。

#3


8  

1D and 2D Static Arrays

  • Size: Both will require the same amount of memory.

    大小:两者都需要相同数量的内存。

  • Speed: You can assume that there will be no speed difference because the memory for both of these arrays should be contiguous (The whole 2D array should appear as one chunk in memory rather than a bunch of chunks spread across memory). (This could be compiler dependent however.)

    速度:您可以假设没有速度差异,因为这两个数组的内存应该是连续的(整个2D数组应该显示为内存中的一个块,而不是分散在内存中的一堆块)。(不过,这可能是编译器所依赖的。)

1D and 2D Dynamic Arrays

  • Size: The 2D array will require a tiny bit more memory than the 1D array because of the pointers needed in the 2D array to point to the set of allocated 1D arrays. (This tiny bit is only tiny when we're talking about really big arrays. For small arrays, the tiny bit could be pretty big relatively speaking.)

    尺寸:二维数组需要比一维数组更小的内存,因为2D数组中需要指针指向已分配的1D数组。(当我们讨论真正的大数组时,这个微小的部分只有很小的一部分。对于小的数组来说,相对来说,小的数组可能是相当大的。

  • Speed: The 1D array may be faster than the 2D array because the memory for the 2D array would not be contiguous, so cache misses would become a problem.

    速度:一维数组可能比2D数组快,因为2D数组的内存不是连续的,所以缓存遗漏会成为一个问题。


Use what works and seems most logical, and if you face speed problems, then refactor.

如果你遇到了速度问题,那就重新考虑。

#4


4  

Another difference of 1D and 2D arrays appears in memory allocation. We cannot be sure that members of 2D array be sequental.

1D和2D数组的另一个区别出现在内存分配中。我们不能确定二维数组的成员是序列的。

#5


4  

The existing answers all only compare 1-D arrays against arrays of pointers.

现有的答案都只比较1-D数组和指针数组。

In C (but not C++) there is a third option; you can have a contiguous 2-D array that is dynamically allocated and has runtime dimensions:

在C(而不是c++)中有第三个选项;您可以有一个连续的二维数组,它是动态分配的,并且具有运行时维度:

int (*p)[num_columns] = malloc(num_rows * sizeof *p);

and this is accessed like p[row_index][col_index].

这就像p[row_index][col_index]一样被访问。

I would expect this to have very similar performance to the 1-D array case, but it gives you nicer syntax for accessing the cells.

我希望它与一维数组的情况有非常相似的性能,但是它为访问单元格提供了更好的语法。

In C++ you can achieve something similar by defining a class which maintains a 1-D array internally, but can expose it via 2-D array access syntax using overloaded operators. Again I would expect that to have similar or identical performance to the plain 1-D array.

在c++中,您可以通过定义一个在内部维护1-D数组的类来实现类似的功能,但是可以通过使用重载的操作符来通过2-D数组访问语法公开它。同样,我希望在普通的一维数组中具有类似或相同的性能。

#6


1  

It really depends on how your 2D array is implemented.

这取决于你的2D数组是如何实现的。

int a[200], b[10][20], *c[10], *d[10];
for (ii = 0; ii < 10; ++ii)
{
   c[ii] = &b[ii][0];
   d[ii] = (int*) malloc(20 * sizeof(int));    // The cast for C++ only.
}

There are 3 implementations here: b, c and d There won't be a lot of difference accessing b[x][y] or a[x*20 + y] since one is you doing the computation and the other is the compiler doing it for you. c[x][y] and d[x][y] are slower because the machine has to find the address that c[x] points to and then access the yth element from there. It is not one straight computation. On some machines (eg AS400 which has 36 byte (not bit) pointers), pointer access is extremely slow. It is all dependent upon the architecture in use. On x86 type architectures, a and b are the same speed, c and d are slower than b.

这里有3个实现:b、c和d,访问b[x][y]或a[x*20 + y]不会有太大的差别,因为一个是你在做计算,另一个是编译器为你做的。c[x][y]和d[x][y]较慢,因为机器必须找到c[x]指向的地址,然后从那里访问yth元素。这不是一个简单的计算。在某些机器上(如AS400有36个字节(而不是位)指针),指针访问非常慢。它完全依赖于使用的体系结构。在x86类型的架构上,a和b是相同的速度,c和d比b慢。

#7


0  

I love the thorough answer provided by Pixelchemist. A simpler version of this solution may be as follows. First, declare the dimensions:

我喜欢Pixelchemist提供的完整答案。这个解决方案的一个简单版本可能如下所示。首先,声明的维度:

constexpr int M = 16; // rows
constexpr int N = 16; // columns
constexpr int P = 16; // planes

Next, create an alias and, get and set methods:

接下来,创建一个别名,获取和设置方法:

template<typename T>
using Vector = std::vector<T>;

template<typename T>
inline T& set_elem(vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

template<typename T>
inline const T& get_elem(const vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

Finally, a vector may be created and indexed as follows:

最后,可以创建一个向量并将其索引为:

Vector array3d(M*N*P, 0);            // create 3-d array containing M*N*P zero ints
set_elem(array3d, 0, 0, 1) = 5;      // array3d[0][0][1] = 5
auto n = get_elem(array3d, 0, 0, 1); // n = 5

Defining the vector size at initialization provides optimal performance. This solution is modified from this answer. The functions may be overloaded to support varying dimensions with a single vector. The downside of this solution is that the M, N, P parameters are implicitly passed to the get and set functions. This can be resolved by implementing the solution within a class, as done by Pixelchemist.

在初始化时定义向量大小提供了最佳性能。这个解是从这个答案中修改的。函数可能被重载以支持单个向量的不同维度。这个解决方案的缺点是,M、N、P参数隐式传递给get和set函数。这可以通过在类中实现解决方案来解决,就像Pixelchemist所做的那样。

#1


148  

tl;dr : You should probably use a one-dimensional approach.

Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.

注意:在比较动态的一维或动态2d存储模式时,不能在没有填充书籍的情况下对性能进行深入挖掘,因为代码的性能依赖于大量的参数。如果可能的话。

1. What's faster?

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

对于密集矩阵,1D方法可能会更快,因为它提供更好的内存位置和更少的分配和分配开销。

2. What's smaller?

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

动态1d比2D方法消耗更少的内存。后者还需要更多的拨款。

Remarks

I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.

我给出了一个很长的答案,有几个原因,但我想先谈谈你的假设。

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)

我可以想象,对于一维数组(y + x*n)的重新计算索引可能比使用2D数组(x, y)要慢

Let's compare these two functions:

我们来比较一下这两个函数

int get_2d (int **p, int r, int c) { return p[r][c]; }
int get_1d (int *p, int r, int c)  { return p[c + C*r]; }

The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:

Visual Studio 2015 RC为这些功能(优化打开)生成的(非内联的)程序集是:

?get_1d@@YAHPAHII@Z PROC
push    ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

?get_2d@@YAHPAPAHII@Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

The difference is mov (2d) vs. lea (1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

区别是mov (2d)和lea (1d)。前者有3个周期的潜伏期,而后者的最大吞吐量为2个周期,而后者的延迟为2个周期,最大吞吐量为3个周期。(根据指示表- Agner Fog由于差异较小,我认为不应该出现因指数重新计算而产生的巨大性能差异。我希望它不太可能识别出这种差异,从而成为任何程序的瓶颈。

This brings us to the next (and more interesting) point:

这就引出了下一个(更有趣的)点:

... but I could image that 1D could be in CPU cache ...

…但是我可以想象,1D可能在CPU缓存中…

True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.

没错,但是2d也可能出现在CPU缓存中。看到缺点:内存位置解释为什么1d仍然更好。

The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.

Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.

注:这是关于动态数组/分配方案[malloc/new/vector等]。一个静态的2d数组是一个连续的内存块,因此不受我将要在这里出现的缺点的影响。

The Problem

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

为了能够理解为什么动态数组或向量的动态数组很可能不是选择的数据存储模式,您需要理解这种结构的内存布局。

Example case using pointer to pointer syntax

int main (void)
{
    // allocate memory for 4x4 integers; quick & dirty
    int ** p = new int*[4];
    for (size_t i=0; i<4; ++i) p[i] = new int[4]; 

    // do some stuff here, using p[x][y] 

    // deallocate memory
    for (size_t i=0; i<4; ++i) delete[] p[i];
    delete[] p;
}

The downsides

Memory locality

For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.

对于这个“矩阵”,你分配一个4个指针和4个整数块。所有的分配都是不相关的,因此可以产生任意的内存位置。

The following image will give you an idea of how the memory may look like.

下面的图片会让你了解记忆是怎样的。

For the real 2d case:

对于真实的2d情况:

  • The violet square is the memory position occupied by p itself.
  • 紫色方块是p本身占用的内存位置。
  • The green squares assemble the memory region p points to (4 x int*).
  • 绿色方块将记忆区域p点集合到(4 x int*)。
  • The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region
  • 4个相邻的蓝色方格的4个区域是绿色区域的每个int*所指向的区域。

For the 2d mapped on 1d case:

对于一维情况下的二维映射:

  • The green square is the only required pointer int *
  • 绿色方块是唯一需要的指针int *。
  • The blue squares ensemble the memory region for all matrix elements (16 x int).
  • 蓝色方块集成所有矩阵元素的内存区域(16 x int)。

一维或二维数组,哪个更快?

This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.

这意味着(在使用左侧布局时),您可能会观察到比连续存储模式(如右图所示)更糟糕的性能,比如缓存。

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

假设一个高速缓存线是“一次传输到缓存中的数据量”,让我们假设一个程序访问一个接一个的矩阵。

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

如果您有一个正确对齐的4*4矩阵的32位值,一个带有64字节缓存线(典型值)的处理器能够“一次”数据(4*4*4 = 64字节)。如果您开始处理,数据还没有在缓存中,那么您将面对一个缓存缺失,数据将从主内存中获取。这个负载可以一次获取整个矩阵,因为它适合于缓存线,如果且仅当它是连续存储的(并且正确对齐)。在处理这些数据时,可能不会再有任何遗漏。

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

如果是一个动态的“真实的二维”系统,每个行/列的位置都不相关,处理器就需要分别加载每个内存位置。即使只需要64个字节,在最坏的情况下,为4个不相关的内存位置加载4个缓存线——实际上是传输256个字节,浪费75%的吞吐量带宽。如果您使用2d模式处理数据,您将再次(如果没有缓存)在第一个元素上面对一个缓存缺失。但是现在,只有第一行/colum将在主内存中的第一个加载之后在缓存中,因为所有其他行都位于内存中的其他地方,而不是与第一个行相邻。当您到达一个新的行/列时,将再次出现缓存遗漏,并执行主内存的下一个加载。

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.

长话短说:2d模式有更高的缓存缺失的可能性,因为数据的局部性,1d方案提供了更好的性能潜力。

Frequent Allocation / Deallocation

  • As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
  • 多达N + 1(4 + 1 = 5)分配(使用新的、malloc、分配器::分配或其他)是创建所需的NxM(4 4)矩阵的必要条件。
  • The same number of proper, respective deallocation operations must be applied as well.
  • 同样数量的适当的,相应的deallocation操作也必须被应用。

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

因此,与单一分配方案相比,创建/复制这样的矩阵成本更高。

This is getting even worse with a growing number of rows.

随着行数的增加,情况变得更糟。

Memory consumption overhead

I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)

我将asumme大小为32位的int和32位的指针。(注:系统依赖。)

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

让我们记住:我们想要存储一个4×4整数矩阵这意味着64字节。

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume

对于一个NxM矩阵,存储了我们所使用的指针对指针的方案。

  • N*M*sizeof(int) [the actual blue data] +
  • N*M*sizeof(int)[实际的蓝色数据]+。
  • N*sizeof(int*) [the green pointers] +
  • N*sizeof(int*)[绿色指针]+。
  • sizeof(int**) [the violet variable p] bytes.
  • sizeof(int**)[紫色变量p]字节。

That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.

在本例中,4*4*4 + 4*4 + 4 = 84字节,使用std时更糟糕::vector <:vector> >。它需要N * M * sizeof(int) + N * sizeof(vector ) + sizeof(向量 >)字节,即4*4*4 + 4*16 + 16 = 144字节,64字节的整数为4 x 4 int。

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)

此外,根据使用的分配器,每个单独的分配可能(而且很可能会)有另外16个字节的内存开销。(一些“Infobytes”将分配的字节数存储为适当的deallocation目的。)

This means the worst case is:

这意味着最坏的情况是:

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

N *(16 + M * sizeof(int))+ 16 + N * sizeof(int *)+ sizeof(int * *)= 4 *(16 + 4 * 4)+ 16 + 4 * 4 + 4 = 164字节!_Overhead:156% _

The share of the overhead will reduce as the size of the matrix grows but will still be present.

当矩阵的大小增加时,开销的份额会减小,但仍然会存在。

Risk of memory leaks

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

这些分配需要一个适当的异常处理,以避免内存泄漏,如果其中一个分配失败!您将需要跟踪已分配的内存块,并且在释放内存时不能忘记它们。

If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.

如果新运行的内存和下一行不能被分配(特别是当这个矩阵非常大的时候),std::bad_alloc是由new抛出的。

Example:

例子:

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.

在上面提到的新/删除示例中,如果我们希望在bad_alloc异常情况下避免泄漏,我们将面对更多的代码。

  // allocate memory for 4x4 integers; quick & dirty
  size_t const N = 4;
  // we don't need try for this allocation
  // if it fails there is no leak
  int ** p = new int*[N];
  size_t allocs(0U);
  try 
  { // try block doing further allocations
    for (size_t i=0; i<N; ++i) 
    {
      p[i] = new int[4]; // allocate
      ++allocs; // advance counter if no exception occured
    }
  }
  catch (std::bad_alloc & be)
  { // if an exception occurs we need to free out memory
    for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
    delete[] p; // free p
    throw; // rethrow bad_alloc
  }
  /*
     do some stuff here, using p[x][y] 
  */
  // deallocate memory accoding to the number of allocations
  for (size_t i=0; i<allocs; ++i) delete[] p[i];
  delete[] p;

Summary

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

有“真正的2 d”的情况下内存布局符合和意义(即如果每行的列数不是常量)但在最简单和常见的2 d数据存储情况下他们只是膨胀您的代码的复杂性,降低程序的性能和内存效率。

Alternative

You should use a contiguous block of memory and map your rows onto that block.

您应该使用一个连续的内存块,并将您的行映射到该块上。

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

这样做的“c++方法”很可能是编写一个类来管理您的内存,同时考虑一些重要的事情。

Example

To provide an idea of how such a class may look like, here's a simple example with some basic features:

为了提供这样一个类的外观,这里有一个简单的例子,有一些基本的特性:

  • 2d-size-constructible
  • 2 d-size-constructible
  • 2d-resizable
  • 2 d-resizable
  • operator(size_t, size_t) for 2d- row major element access
  • 操作符(size_t, size_t),用于2d行主要元素访问。
  • at(size_t, size_t) for checked 2d-row major element access
  • 在(size_t, size_t)中检查第2行主要元素访问。
  • Fulfills Concept requirements for Container
  • 满足容器的概念要求。

Source:

来源:

#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>

namespace matrices
{

  template<class T>
  class simple
  {
  public:
    // misc types
    using data_type  = std::vector<T>;
    using value_type = typename std::vector<T>::value_type;
    using size_type  = typename std::vector<T>::size_type;
    // ref
    using reference       = typename std::vector<T>::reference;
    using const_reference = typename std::vector<T>::const_reference;
    // iter
    using iterator       = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    // reverse iter
    using reverse_iterator       = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;

    // empty construction
    simple() = default;

    // default-insert rows*cols values
    simple(size_type rows, size_type cols)
      : m_rows(rows), m_cols(cols), m_data(rows*cols)
    {}

    // copy initialized matrix rows*cols
    simple(size_type rows, size_type cols, const_reference val)
      : m_rows(rows), m_cols(cols), m_data(rows*cols, val)
    {}

    // 1d-iterators

    iterator begin() { return m_data.begin(); }
    iterator end() { return m_data.end(); }
    const_iterator begin() const { return m_data.begin(); }
    const_iterator end() const { return m_data.end(); }
    const_iterator cbegin() const { return m_data.cbegin(); }
    const_iterator cend() const { return m_data.cend(); }
    reverse_iterator rbegin() { return m_data.rbegin(); }
    reverse_iterator rend() { return m_data.rend(); }
    const_reverse_iterator rbegin() const { return m_data.rbegin(); }
    const_reverse_iterator rend() const { return m_data.rend(); }
    const_reverse_iterator crbegin() const { return m_data.crbegin(); }
    const_reverse_iterator crend() const { return m_data.crend(); }

    // element access (row major indexation)
    reference operator() (size_type const row,
      size_type const column)
    {
      return m_data[m_cols*row + column];
    }
    const_reference operator() (size_type const row,
      size_type const column) const
    {
      return m_data[m_cols*row + column];
    }
    reference at() (size_type const row, size_type const column)
    {
      return m_data.at(m_cols*row + column);
    }
    const_reference at() (size_type const row, size_type const column) const
    {
      return m_data.at(m_cols*row + column);
    }

    // resizing
    void resize(size_type new_rows, size_type new_cols)
    {
      // new matrix new_rows times new_cols
      simple tmp(new_rows, new_cols);
      // select smaller row and col size
      auto mc = std::min(m_cols, new_cols);
      auto mr = std::min(m_rows, new_rows);
      for (size_type i(0U); i < mr; ++i)
      {
        // iterators to begin of rows
        auto row = begin() + i*m_cols;
        auto tmp_row = tmp.begin() + i*new_cols;
        // move mc elements to tmp
        std::move(row, row + mc, tmp_row);
      }
      // move assignment to this
      *this = std::move(tmp);
    }

    // size and capacity
    size_type size() const { return m_data.size(); }
    size_type max_size() const { return m_data.max_size(); }
    bool empty() const { return m_data.empty(); }
    // dimensionality
    size_type rows() const { return m_rows; }
    size_type cols() const { return m_cols; }
    // data swapping
    void swap(simple &rhs)
    {
      using std::swap;
      m_data.swap(rhs.m_data);
      swap(m_rows, rhs.m_rows);
      swap(m_cols, rhs.m_cols);
    }
  private:
    // content
    size_type m_rows{ 0u };
    size_type m_cols{ 0u };
    data_type m_data{};
  };
  template<class T>
  void swap(simple<T> & lhs, simple<T> & rhs)
  {
    lhs.swap(rhs);
  }
  template<class T>
  bool operator== (simple<T> const &a, simple<T> const &b)
  {
    if (a.rows() != b.rows() || a.cols() != b.cols())
    {
      return false;
    }
    return std::equal(a.begin(), a.end(), b.begin(), b.end());
  }
  template<class T>
  bool operator!= (simple<T> const &a, simple<T> const &b)
  {
    return !(a == b);
  }

}

Note several things here:

注意几件事情:

  • T needs to fulfill the requirements of the used std::vector member functions
  • T需要满足使用的std::向量成员函数的要求。
  • operator() doesn't do any "of of range" checks
  • 操作符()不会执行任何“范围”检查。
  • No need to manage data on your own
  • 无需自己管理数据。
  • No destructor, copy constructor or assignment operators required
  • 不需要析构函数、复制构造函数或赋值操作符。

So you don't have to bother about proper memory handling for each application but only once for the class you write.

因此,您不必为每个应用程序的正确内存处理操心,但只需要为您编写的类进行一次处理。

Restrictions

There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if

在某些情况下,动态的“真实”二维结构是有利的。这是举个例子。

  • the matrix is very large and sparse (if any of the rows do not even need to be allocated but can be handled using a nullptr) or if
  • 该矩阵非常大且非常稀疏(如果有任何行甚至不需要分配,但可以使用nullptr来处理)或if。
  • the rows do not have the same number of columns (that is if you don't have a matrix at all but another two-dimensional construct).
  • 行没有相同数量的列(如果您没有一个矩阵,而是另一个二维构造)。

#2


15  

Unless you are talking about static arrays, 1D is faster.

除非你讨论的是静态数组,否则1D会更快。

Here’s the memory layout of a 1D array (std::vector<T>):

这是一维数组的内存布局(std::vector ):

+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |
+---+---+---+---+---+---+---+---+---+

And here’s the same for a dynamic 2D array (std::vector<std::vector<T>>):

对于一个动态2D数组(std::vector <:vector> >):

+---+---+---+
| * | * | * |
+-|-+-|-+-|-+
  |   |   V
  |   | +---+---+---+
  |   | |   |   |   |
  |   | +---+---+---+
  |   V
  | +---+---+---+
  | |   |   |   |
  | +---+---+---+
  V
+---+---+---+
|   |   |   |
+---+---+---+

Clearly the 2D case loses the cache locality and uses more memory. It also introduces an extra indirection (and thus an extra pointer to follow) but the first array has the overhead of calculating the indices so these even out more or less.

显然,2D情况会丢失缓存位置并使用更多内存。它还引入了一个额外的间接(因此是一个额外的指针),但是第一个数组有计算索引的开销,所以这些甚至多或少。

#3


8  

1D and 2D Static Arrays

  • Size: Both will require the same amount of memory.

    大小:两者都需要相同数量的内存。

  • Speed: You can assume that there will be no speed difference because the memory for both of these arrays should be contiguous (The whole 2D array should appear as one chunk in memory rather than a bunch of chunks spread across memory). (This could be compiler dependent however.)

    速度:您可以假设没有速度差异,因为这两个数组的内存应该是连续的(整个2D数组应该显示为内存中的一个块,而不是分散在内存中的一堆块)。(不过,这可能是编译器所依赖的。)

1D and 2D Dynamic Arrays

  • Size: The 2D array will require a tiny bit more memory than the 1D array because of the pointers needed in the 2D array to point to the set of allocated 1D arrays. (This tiny bit is only tiny when we're talking about really big arrays. For small arrays, the tiny bit could be pretty big relatively speaking.)

    尺寸:二维数组需要比一维数组更小的内存,因为2D数组中需要指针指向已分配的1D数组。(当我们讨论真正的大数组时,这个微小的部分只有很小的一部分。对于小的数组来说,相对来说,小的数组可能是相当大的。

  • Speed: The 1D array may be faster than the 2D array because the memory for the 2D array would not be contiguous, so cache misses would become a problem.

    速度:一维数组可能比2D数组快,因为2D数组的内存不是连续的,所以缓存遗漏会成为一个问题。


Use what works and seems most logical, and if you face speed problems, then refactor.

如果你遇到了速度问题,那就重新考虑。

#4


4  

Another difference of 1D and 2D arrays appears in memory allocation. We cannot be sure that members of 2D array be sequental.

1D和2D数组的另一个区别出现在内存分配中。我们不能确定二维数组的成员是序列的。

#5


4  

The existing answers all only compare 1-D arrays against arrays of pointers.

现有的答案都只比较1-D数组和指针数组。

In C (but not C++) there is a third option; you can have a contiguous 2-D array that is dynamically allocated and has runtime dimensions:

在C(而不是c++)中有第三个选项;您可以有一个连续的二维数组,它是动态分配的,并且具有运行时维度:

int (*p)[num_columns] = malloc(num_rows * sizeof *p);

and this is accessed like p[row_index][col_index].

这就像p[row_index][col_index]一样被访问。

I would expect this to have very similar performance to the 1-D array case, but it gives you nicer syntax for accessing the cells.

我希望它与一维数组的情况有非常相似的性能,但是它为访问单元格提供了更好的语法。

In C++ you can achieve something similar by defining a class which maintains a 1-D array internally, but can expose it via 2-D array access syntax using overloaded operators. Again I would expect that to have similar or identical performance to the plain 1-D array.

在c++中,您可以通过定义一个在内部维护1-D数组的类来实现类似的功能,但是可以通过使用重载的操作符来通过2-D数组访问语法公开它。同样,我希望在普通的一维数组中具有类似或相同的性能。

#6


1  

It really depends on how your 2D array is implemented.

这取决于你的2D数组是如何实现的。

int a[200], b[10][20], *c[10], *d[10];
for (ii = 0; ii < 10; ++ii)
{
   c[ii] = &b[ii][0];
   d[ii] = (int*) malloc(20 * sizeof(int));    // The cast for C++ only.
}

There are 3 implementations here: b, c and d There won't be a lot of difference accessing b[x][y] or a[x*20 + y] since one is you doing the computation and the other is the compiler doing it for you. c[x][y] and d[x][y] are slower because the machine has to find the address that c[x] points to and then access the yth element from there. It is not one straight computation. On some machines (eg AS400 which has 36 byte (not bit) pointers), pointer access is extremely slow. It is all dependent upon the architecture in use. On x86 type architectures, a and b are the same speed, c and d are slower than b.

这里有3个实现:b、c和d,访问b[x][y]或a[x*20 + y]不会有太大的差别,因为一个是你在做计算,另一个是编译器为你做的。c[x][y]和d[x][y]较慢,因为机器必须找到c[x]指向的地址,然后从那里访问yth元素。这不是一个简单的计算。在某些机器上(如AS400有36个字节(而不是位)指针),指针访问非常慢。它完全依赖于使用的体系结构。在x86类型的架构上,a和b是相同的速度,c和d比b慢。

#7


0  

I love the thorough answer provided by Pixelchemist. A simpler version of this solution may be as follows. First, declare the dimensions:

我喜欢Pixelchemist提供的完整答案。这个解决方案的一个简单版本可能如下所示。首先,声明的维度:

constexpr int M = 16; // rows
constexpr int N = 16; // columns
constexpr int P = 16; // planes

Next, create an alias and, get and set methods:

接下来,创建一个别名,获取和设置方法:

template<typename T>
using Vector = std::vector<T>;

template<typename T>
inline T& set_elem(vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

template<typename T>
inline const T& get_elem(const vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

Finally, a vector may be created and indexed as follows:

最后,可以创建一个向量并将其索引为:

Vector array3d(M*N*P, 0);            // create 3-d array containing M*N*P zero ints
set_elem(array3d, 0, 0, 1) = 5;      // array3d[0][0][1] = 5
auto n = get_elem(array3d, 0, 0, 1); // n = 5

Defining the vector size at initialization provides optimal performance. This solution is modified from this answer. The functions may be overloaded to support varying dimensions with a single vector. The downside of this solution is that the M, N, P parameters are implicitly passed to the get and set functions. This can be resolved by implementing the solution within a class, as done by Pixelchemist.

在初始化时定义向量大小提供了最佳性能。这个解是从这个答案中修改的。函数可能被重载以支持单个向量的不同维度。这个解决方案的缺点是,M、N、P参数隐式传递给get和set函数。这可以通过在类中实现解决方案来解决,就像Pixelchemist所做的那样。