I am randomly generating a grid of characters and storing it in a char[,] array ...
我随机生成一个字符网格并将其存储在char [,]数组中......
I need a way to ensure that i haven't already generated a grid before serializing it to a database in binary format...what is the best way to compare two grids based on bytes? The last thing i want to do is loop through their contents as I am already pulling one of them from the db in byte form.
我需要一种方法来确保在将数据序列化为二进制格式的数据库之前我还没有生成网格...基于字节比较两个网格的最佳方法是什么?我想要做的最后一件事就是遍历它们的内容,因为我已经从字节形式的数据库中拉出了其中一个。
I was thinking checksum but not so sure if this would work.
我在考虑校验和,但不确定这是否有效。
char[,] grid = new char[8,8];
char[,] secondgrid = new char[8,8];//gets its data from db
4 个解决方案
#1
From what I can see, you are going to have to loop over the contents (or at least, a portion of it); there is no other way of talking about an arrays contents.
从我所看到的,你将不得不循环内容(或至少,它的一部分);没有其他方式来讨论数组内容。
Well, as a fast "definitely not the same" you could compute a hash over the array - i.e. something like:
好吧,作为一个快速的“绝对不一样”你可以在数组上计算一个哈希值 - 例如:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + c.GetHashCode();
}
This has the risk of some false positives (reporting a dup when it is unique), but is otherwise quite cheap. Any use? You could store the hash alongside the data in the database to allow fast checks - but if you do that you should pick your own hash algorithm for char
(since it isn't guaranteed to stay the same) - perhaps just convert to an int, for example - or to re-use the existing implementation:
这有一些误报的风险(当它是唯一的时报告重复),但在其他方面相当便宜。有用吗?您可以将哈希与数据一起存储在数据库中以允许快速检查 - 但是如果您这样做,您应该为char选择自己的哈希算法(因为它不能保证保持不变) - 也许只是转换为int,例如 - 或重用现有的实现:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + (c | (c << 0x10));
}
As an aside - for 8x8, you could always just think in terms of a 64 character string, and just check ==
. This would work equally well at the database and application.
暂时 - 对于8x8,您可以始终只考虑64个字符的字符串,只需检查==。这在数据库和应用程序中同样有效。
#2
Can't you get the database to do it? Make the grid column UNIQUE. Then, if you need to detect that you've generated a duplicate grid, the method for doing this might involve checking the number of rows affected by your operation, or perhaps testing for errors.
你不能让数据库去做吗?使网格列为UNIQUE。然后,如果您需要检测是否已生成重复的网格,则执行此操作的方法可能涉及检查受操作影响的行数,或者可能需要测试错误。
Also, if each byte is simply picked at random from [0, 255], then performing a hash to get a 4-byte number is no better than taking the first four bytes out of the grid. The chance of collisions is the same.
此外,如果从[0,255]中随机选择每个字节,则执行哈希以获得4字节数字并不比从网格中取出前四个字节更好。碰撞的机会是一样的。
#3
I'd go with a checksum/hash mechanism to catch a large percentage of the matches, then do a full comparison if you get a match.
我会使用校验和/哈希机制来捕获大部分匹配项,然后在得到匹配项时进行完整比较。
What is the range of characters used to fill in your grid? If you're using just letters (not mixed case, or case not important), and an 8x8 grid, you're only talking about 7 or so possible collisions per item within your problem space (a very rare occurence) assuming a good hashing function. You could do something like:
用于填充网格的字符范围是多少?如果你只使用字母(不是混合大小写,或者不重要的情况)和8x8网格,你只是在谈论你的问题空间中每个项目可能发生7次左右的碰撞(非常罕见的情况),假设有良好的散列功能。你可以这样做:
- Generate Grid
- Load any matching grids from DB
- if found match from #2, goto 1
- Use your new grid.
从DB加载任何匹配的网格
如果找到匹配#2,转到1
使用新网格。
#4
Try this (invoke ComputeHash for every matrix and compare the guids):
试试这个(为每个矩阵调用ComputeHash并比较guid):
private static MD5 md5 = MD5.Create();
public static Guid ComputeHash(object value)
{
Guid g = Guid.Empty;
BinaryFormatter bf = new BinaryFormatter();
using (MemoryStream stm = new MemoryStream())
{
bf.Serialize(stm, value);
g = new Guid(md5.ComputeHash(stm.ToArray()));
stm.Close();
}
return g;
}
note: Generating the byte array might be accomplished a lot simpler since you have a char array.
注意:生成字节数组可能会更简单,因为你有一个char数组。
#1
From what I can see, you are going to have to loop over the contents (or at least, a portion of it); there is no other way of talking about an arrays contents.
从我所看到的,你将不得不循环内容(或至少,它的一部分);没有其他方式来讨论数组内容。
Well, as a fast "definitely not the same" you could compute a hash over the array - i.e. something like:
好吧,作为一个快速的“绝对不一样”你可以在数组上计算一个哈希值 - 例如:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + c.GetHashCode();
}
This has the risk of some false positives (reporting a dup when it is unique), but is otherwise quite cheap. Any use? You could store the hash alongside the data in the database to allow fast checks - but if you do that you should pick your own hash algorithm for char
(since it isn't guaranteed to stay the same) - perhaps just convert to an int, for example - or to re-use the existing implementation:
这有一些误报的风险(当它是唯一的时报告重复),但在其他方面相当便宜。有用吗?您可以将哈希与数据一起存储在数据库中以允许快速检查 - 但是如果您这样做,您应该为char选择自己的哈希算法(因为它不能保证保持不变) - 也许只是转换为int,例如 - 或重用现有的实现:
int hash = 7;
foreach (char c in data) {
hash = (hash * 17) + (c | (c << 0x10));
}
As an aside - for 8x8, you could always just think in terms of a 64 character string, and just check ==
. This would work equally well at the database and application.
暂时 - 对于8x8,您可以始终只考虑64个字符的字符串,只需检查==。这在数据库和应用程序中同样有效。
#2
Can't you get the database to do it? Make the grid column UNIQUE. Then, if you need to detect that you've generated a duplicate grid, the method for doing this might involve checking the number of rows affected by your operation, or perhaps testing for errors.
你不能让数据库去做吗?使网格列为UNIQUE。然后,如果您需要检测是否已生成重复的网格,则执行此操作的方法可能涉及检查受操作影响的行数,或者可能需要测试错误。
Also, if each byte is simply picked at random from [0, 255], then performing a hash to get a 4-byte number is no better than taking the first four bytes out of the grid. The chance of collisions is the same.
此外,如果从[0,255]中随机选择每个字节,则执行哈希以获得4字节数字并不比从网格中取出前四个字节更好。碰撞的机会是一样的。
#3
I'd go with a checksum/hash mechanism to catch a large percentage of the matches, then do a full comparison if you get a match.
我会使用校验和/哈希机制来捕获大部分匹配项,然后在得到匹配项时进行完整比较。
What is the range of characters used to fill in your grid? If you're using just letters (not mixed case, or case not important), and an 8x8 grid, you're only talking about 7 or so possible collisions per item within your problem space (a very rare occurence) assuming a good hashing function. You could do something like:
用于填充网格的字符范围是多少?如果你只使用字母(不是混合大小写,或者不重要的情况)和8x8网格,你只是在谈论你的问题空间中每个项目可能发生7次左右的碰撞(非常罕见的情况),假设有良好的散列功能。你可以这样做:
- Generate Grid
- Load any matching grids from DB
- if found match from #2, goto 1
- Use your new grid.
从DB加载任何匹配的网格
如果找到匹配#2,转到1
使用新网格。
#4
Try this (invoke ComputeHash for every matrix and compare the guids):
试试这个(为每个矩阵调用ComputeHash并比较guid):
private static MD5 md5 = MD5.Create();
public static Guid ComputeHash(object value)
{
Guid g = Guid.Empty;
BinaryFormatter bf = new BinaryFormatter();
using (MemoryStream stm = new MemoryStream())
{
bf.Serialize(stm, value);
g = new Guid(md5.ComputeHash(stm.ToArray()));
stm.Close();
}
return g;
}
note: Generating the byte array might be accomplished a lot simpler since you have a char array.
注意:生成字节数组可能会更简单,因为你有一个char数组。