I have a complex problem, I don't know whether I can describe it properly or not.
我有一个复杂的问题,我不知道我能否恰当地描述它。
I have two dimensional array of objects of a class. Currently my algorithm operates only on this two dimensional array but only some of the locations of that array are occupied. (almost 40%)
我有一个类的二维对象数组。目前,我的算法只对这个二维数组进行操作,但只占用该数组的一些位置。(近40%)
It works fine for small data set but if I have large data set (large number of elements of that 2d array e.g. 10000) then the program becomes memory exhaustive. Because I have nested loops that make 10000 * 10000 = 100000000 iterations.
它适用于小数据集,但是如果我有大数据集(大量的二维数组元素,比如10000),那么程序就会变得内存详尽。因为我有嵌套循环,使10000 * 10000 = 100000000次迭代。
Can I replace the 2 d array with Hashtable or some other data structure? My main aim is to reduce the number of iterations only by changing the data structure.
我可以用Hashtable或其他数据结构替换2d数组吗?我的主要目标是通过改变数据结构来减少迭代次数。
Pardon me for not explaining properly. I am developing using C#
请原谅我没有解释清楚。我正在使用c#进行开发
4 个解决方案
#1
4
Sounds like the data structure you have is a sparse matrix and I'm going to point you to Are there any storage optimized Sparse Matrix implementations in C#?
听起来你的数据结构是一个稀疏矩阵我要告诉你在c#中有任何存储优化的稀疏矩阵实现吗?
#2
1
You can create a key for a dictionary from the array coordinates. Something like:
您可以从数组坐标为字典创建一个键。喜欢的东西:
int key = x * 46000 + y;
(This naturally works for coordinates resembling an array up to 46000x46000, which is about what you can fit in an int
. If you need to represent a larger array, you would use a long
value as key.)
(这很自然地适用于类似于46000x46000个数组的坐标,这是一个int类型的值。如果需要表示一个更大的数组,可以使用长值作为键。)
With the key you can store and retreive the object in a Dictionary<int, YourClass>
. Storing and retrieving values from the dictionary is quite fast, not much slower than using an array.
使用这个键,您可以在Dictionary
You can iterate the items in the dictionary, but you won't get them in a predictable order, i.e. not the same as looping the x and y coordinates of an array.
您可以在字典中迭代这些项,但是不会以可预测的顺序获得它们,也就是说,不等同于对数组的x和y坐标进行循环。
#3
1
If you need high performance you can roll down your own data structure. If the objects can be contained in only one container and not moved to other containers, you can do a custom hashset like data structure.
如果需要高性能,可以使用自己的数据结构。如果对象只能包含在一个容器中,而不能移动到其他容器中,则可以执行自定义hashset,如数据结构。
You add X, Y and Next fields into your class. You make a singly linked list of your object stored in an array that is your hash table. This can be very very fast.
将X、Y和Next字段添加到类中。将存储在哈希表数组中的对象创建一个单独链接的列表。这可能非常非常快。
I wrote it from scratch, there may be bugs. Clear, and rehash are not implemented, this is a demonstration only. Complexity of all operation is averaged O(1).
我从头开始写,可能有bug。清除,并且rehash没有实现,这只是一个演示。所有操作的复杂度平均为O(1)。
To make easy to enumerate on all nodes skipping empty nodes, there is a doubly linked list. Complexity of insertion and removal from a doubly linked list is O(1), and you will be able to enumerate all nodes skipping unused nodes, so the complexity for enumerating all nodes is O(n) where n is the number of nodes, not the "virtual" size of this sparse matrix.
为了便于在所有跳过空节点的节点上枚举,有一个双链表。从双链表中插入和删除的复杂性是O(1),您将能够枚举跳过未使用节点的所有节点,因此枚举所有节点的复杂性是O(n),其中n是节点的数量,而不是这个稀疏矩阵的“虚”大小。
Using a doubly linked list you can enumerate items in the same order as you insert it. The order is unrelated to X and Y coordinates.
使用双链表,可以按插入时的顺序枚举项目。顺序与X和Y坐标无关。
public class Node
{
internal NodeTable pContainer;
internal Node pTableNext;
internal int pX;
internal int pY;
internal Node pLinkedListPrev;
internal Node pLinkedListNext;
}
public class NodeTable :
IEnumerable<Node>
{
private Node[] pTable;
private Node pLinkedListFirst;
private Node pLinkedListLast;
// Capacity must be a prime number great enough as much items you want to store.
// You can make this dynamic too but need some more work (rehashing and prime number computation).
public NodeTable(int capacity)
{
this.pTable = new Node[capacity];
}
public int GetHashCode(int x, int y)
{
return (x + y * 104729); // Must be a prime number
}
public Node Get(int x, int y)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket]; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
return current;
}
return null;
}
public IEnumerator<Node> GetEnumerator()
{
// Replace yield with a custom struct Enumerator to optimize performances.
for (Node node = this.pLinkedListFirst, next; node != null; node = next)
{
next = node.pLinkedListNext;
yield return node;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public bool Set(int x, int y, Node node)
{
if (node == null || node.pContainer != null)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket], prev = null; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
{
this.fRemoveFromLinkedList(current);
if (node == null)
{
// Remove from table linked list
if (prev != null)
prev.pTableNext = current.pTableNext;
else
this.pTable[bucket] = current.pTableNext;
current.pTableNext = null;
}
else
{
// Replace old node from table linked list
node.pTableNext = current.pTableNext;
current.pTableNext = null;
if (prev != null)
prev.pTableNext = node;
else
this.pTable[bucket] = node;
node.pContainer = this;
node.pX = x;
node.pY = y;
this.fAddToLinkedList(node);
}
return true;
}
prev = current;
}
// New node.
node.pContainer = this;
node.pX = x;
node.pY = y;
// Add to table linked list
node.pTableNext = this.pTable[bucket];
this.pTable[bucket] = node;
// Add to global linked list
this.fAddToLinkedList(node);
return true;
}
return false;
}
private void fRemoveFromLinkedList(Node node)
{
Node prev = node.pLinkedListPrev;
Node next = node.pLinkedListNext;
if (prev != null)
prev.pLinkedListNext = next;
else
this.pLinkedListFirst = next;
if (next != null)
next.pLinkedListPrev = prev;
else
this.pLinkedListLast = prev;
node.pLinkedListPrev = null;
node.pLinkedListNext = null;
}
private void fAddToLinkedList(Node node)
{
node.pLinkedListPrev = this.pLinkedListLast;
this.pLinkedListLast = node;
if (this.pLinkedListFirst == null)
this.pLinkedListFirst = node;
}
}
#4
0
arrays give multiple features:
数组给多个特性:
- A way of organizing data as a list of elements
- 将数据组织为元素列表的一种方法
- A way to access the data elements by index number (1st, 2nd, 3rd etc)
- 按索引号(1、2、3等)访问数据元素的方法
But a common downside (depends on the language and runtime) is that arrays are often work poorly as a sparse data structure--if you don't need all of the array elements then you end up with wasted memory space.
但是一个常见的缺点(取决于语言和运行时)是数组作为稀疏的数据结构通常工作得很差——如果不需要所有的数组元素,那么就会浪费内存空间。
So, yes, a hashtable will usually save space over an array.
所以,哈希表通常会在数组上节省空间。
But You asked My main aim is to reduce the number of iterations only by changing the data structure.
In order to answer that question, we need to know more about your algorithm--what you're doing in each loop of your program.
但是您问我的主要目标是通过改变数据结构来减少迭代次数。为了回答这个问题,我们需要更多地了解你的算法——你在程序的每个循环中都在做什么。
For example, there are many ways to sort an array or a matrix. The different algorithms for sorting use differing numbers of iterations.
例如,有许多方法可以对数组或矩阵进行排序。不同的排序算法使用不同数量的迭代。
#1
4
Sounds like the data structure you have is a sparse matrix and I'm going to point you to Are there any storage optimized Sparse Matrix implementations in C#?
听起来你的数据结构是一个稀疏矩阵我要告诉你在c#中有任何存储优化的稀疏矩阵实现吗?
#2
1
You can create a key for a dictionary from the array coordinates. Something like:
您可以从数组坐标为字典创建一个键。喜欢的东西:
int key = x * 46000 + y;
(This naturally works for coordinates resembling an array up to 46000x46000, which is about what you can fit in an int
. If you need to represent a larger array, you would use a long
value as key.)
(这很自然地适用于类似于46000x46000个数组的坐标,这是一个int类型的值。如果需要表示一个更大的数组,可以使用长值作为键。)
With the key you can store and retreive the object in a Dictionary<int, YourClass>
. Storing and retrieving values from the dictionary is quite fast, not much slower than using an array.
使用这个键,您可以在Dictionary
You can iterate the items in the dictionary, but you won't get them in a predictable order, i.e. not the same as looping the x and y coordinates of an array.
您可以在字典中迭代这些项,但是不会以可预测的顺序获得它们,也就是说,不等同于对数组的x和y坐标进行循环。
#3
1
If you need high performance you can roll down your own data structure. If the objects can be contained in only one container and not moved to other containers, you can do a custom hashset like data structure.
如果需要高性能,可以使用自己的数据结构。如果对象只能包含在一个容器中,而不能移动到其他容器中,则可以执行自定义hashset,如数据结构。
You add X, Y and Next fields into your class. You make a singly linked list of your object stored in an array that is your hash table. This can be very very fast.
将X、Y和Next字段添加到类中。将存储在哈希表数组中的对象创建一个单独链接的列表。这可能非常非常快。
I wrote it from scratch, there may be bugs. Clear, and rehash are not implemented, this is a demonstration only. Complexity of all operation is averaged O(1).
我从头开始写,可能有bug。清除,并且rehash没有实现,这只是一个演示。所有操作的复杂度平均为O(1)。
To make easy to enumerate on all nodes skipping empty nodes, there is a doubly linked list. Complexity of insertion and removal from a doubly linked list is O(1), and you will be able to enumerate all nodes skipping unused nodes, so the complexity for enumerating all nodes is O(n) where n is the number of nodes, not the "virtual" size of this sparse matrix.
为了便于在所有跳过空节点的节点上枚举,有一个双链表。从双链表中插入和删除的复杂性是O(1),您将能够枚举跳过未使用节点的所有节点,因此枚举所有节点的复杂性是O(n),其中n是节点的数量,而不是这个稀疏矩阵的“虚”大小。
Using a doubly linked list you can enumerate items in the same order as you insert it. The order is unrelated to X and Y coordinates.
使用双链表,可以按插入时的顺序枚举项目。顺序与X和Y坐标无关。
public class Node
{
internal NodeTable pContainer;
internal Node pTableNext;
internal int pX;
internal int pY;
internal Node pLinkedListPrev;
internal Node pLinkedListNext;
}
public class NodeTable :
IEnumerable<Node>
{
private Node[] pTable;
private Node pLinkedListFirst;
private Node pLinkedListLast;
// Capacity must be a prime number great enough as much items you want to store.
// You can make this dynamic too but need some more work (rehashing and prime number computation).
public NodeTable(int capacity)
{
this.pTable = new Node[capacity];
}
public int GetHashCode(int x, int y)
{
return (x + y * 104729); // Must be a prime number
}
public Node Get(int x, int y)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket]; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
return current;
}
return null;
}
public IEnumerator<Node> GetEnumerator()
{
// Replace yield with a custom struct Enumerator to optimize performances.
for (Node node = this.pLinkedListFirst, next; node != null; node = next)
{
next = node.pLinkedListNext;
yield return node;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public bool Set(int x, int y, Node node)
{
if (node == null || node.pContainer != null)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket], prev = null; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
{
this.fRemoveFromLinkedList(current);
if (node == null)
{
// Remove from table linked list
if (prev != null)
prev.pTableNext = current.pTableNext;
else
this.pTable[bucket] = current.pTableNext;
current.pTableNext = null;
}
else
{
// Replace old node from table linked list
node.pTableNext = current.pTableNext;
current.pTableNext = null;
if (prev != null)
prev.pTableNext = node;
else
this.pTable[bucket] = node;
node.pContainer = this;
node.pX = x;
node.pY = y;
this.fAddToLinkedList(node);
}
return true;
}
prev = current;
}
// New node.
node.pContainer = this;
node.pX = x;
node.pY = y;
// Add to table linked list
node.pTableNext = this.pTable[bucket];
this.pTable[bucket] = node;
// Add to global linked list
this.fAddToLinkedList(node);
return true;
}
return false;
}
private void fRemoveFromLinkedList(Node node)
{
Node prev = node.pLinkedListPrev;
Node next = node.pLinkedListNext;
if (prev != null)
prev.pLinkedListNext = next;
else
this.pLinkedListFirst = next;
if (next != null)
next.pLinkedListPrev = prev;
else
this.pLinkedListLast = prev;
node.pLinkedListPrev = null;
node.pLinkedListNext = null;
}
private void fAddToLinkedList(Node node)
{
node.pLinkedListPrev = this.pLinkedListLast;
this.pLinkedListLast = node;
if (this.pLinkedListFirst == null)
this.pLinkedListFirst = node;
}
}
#4
0
arrays give multiple features:
数组给多个特性:
- A way of organizing data as a list of elements
- 将数据组织为元素列表的一种方法
- A way to access the data elements by index number (1st, 2nd, 3rd etc)
- 按索引号(1、2、3等)访问数据元素的方法
But a common downside (depends on the language and runtime) is that arrays are often work poorly as a sparse data structure--if you don't need all of the array elements then you end up with wasted memory space.
但是一个常见的缺点(取决于语言和运行时)是数组作为稀疏的数据结构通常工作得很差——如果不需要所有的数组元素,那么就会浪费内存空间。
So, yes, a hashtable will usually save space over an array.
所以,哈希表通常会在数组上节省空间。
But You asked My main aim is to reduce the number of iterations only by changing the data structure.
In order to answer that question, we need to know more about your algorithm--what you're doing in each loop of your program.
但是您问我的主要目标是通过改变数据结构来减少迭代次数。为了回答这个问题,我们需要更多地了解你的算法——你在程序的每个循环中都在做什么。
For example, there are many ways to sort an array or a matrix. The different algorithms for sorting use differing numbers of iterations.
例如,有许多方法可以对数组或矩阵进行排序。不同的排序算法使用不同数量的迭代。