Summary: Is there a way to do that? Here's what I mean: suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?
摘要:有办法做到这一点吗?这就是我的意思:假设我有一个无符号的int数。然后我将它乘以几次(并且有溢出,这是预期的)。那么有可能“恢复”原来的价值吗?
In details:
It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?
这都是关于Rabin-Karp滚动哈希的。我需要做的是:我有一个长字符串的哈希 - 例如:“abcd”。然后我有一个更短的子串的哈希 - 例如“cd”。如何用O(1)计算“ab”哈希值,使用两个给定的哈希值?
What I have now as an algorithm:
我现在作为算法:
- substract the "cd" hash from "abcd" hash (remove the last elements from the polynomial)
- devide the "abcd" hash by
p ^ len( "cd" )
, wherep
is the base (prime number).
从“abcd”哈希中减去“cd”哈希值(从多项式中删除最后一个元素)
通过p ^ len(“cd”)来划分“abcd”哈希,其中p是基数(素数)。
So this is:
这是:
a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0
- abcd
a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0-abcd
c * p ^ 1 + d * p ^ 0
- cd
c * p ^ 1 + d * p ^ 0-cd
ab gets:
( ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) - ( c * p ^ 1 + d * p ^ 0 ) ) / ( p ^ 2 ) = a * p ^ 1 + b * p ^ 0
And this works, if I don't have an overflow (if p
is small number). But if it's not - it's not working.
如果我没有溢出(如果p是小数字),这是有效的。但如果不是 - 它不起作用。
Is there any trick or something?
有什么伎俩吗?
P.S. The c++
tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)
附: c ++标签是因为数字的溢出,因为它是特定的(并且与python,scheme或sth不同)
6 个解决方案
#1
5
Don't know about the overflow part, but there is a way of getting back the original value.
不知道溢出部分,但有一种方法可以取回原始值。
The Chinese Remainder Theorem help a great deal. Let's call h = abcd - cd
. G is the value, h
, without overflows, G = h + k*2^32
, assuming the overflow simply does %2^32
. And thus ab = G / p^2
.
中国剩余定理有很大帮助。我们叫h = abcd - cd。 G是没有溢出的值,h,G = h + k * 2 ^ 32,假设溢出只是%2 ^ 32。因此ab = G / p ^ 2。
G = h (mod 2^32)
G = 0 (mod p^2)
If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us
如果p ^ 2和2 ^ 32是互质的。关于中国剩余定理的这个页面给了我们
G = h * b * p^2 (mod 2^32 * p^2)
Where b
is modular multiplicative inverse of p^2 modulo 2^32, b * p^2 = 1 (mod 2^32)
. After you calculate G
, simply divide by p^2
to find ab
.
其中b是p ^ 2模2 ^ 32的模乘法逆,b * p ^ 2 = 1(mod 2 ^ 32)。计算G后,只需除以p ^ 2即可找到ab。
I hope I didn't make any mistakes...
我希望我没有犯任何错误......
#2
3
Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.
扩展欧几里德算法是一个很好的解决方案,但它太复杂,难以实现。有一个更好的。
And there's another way to do this (thanks to e friend of mine (: )
还有另一种方法可以做到这一点(感谢我的朋友(:)
There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m
and a
are coprime:
在*中有一篇很好的文章 - 在这种情况下使用欧拉定理的模乘法逆,当m和a是互质时:
where φ(m)
is Euler's totient function.
其中φ(m)是欧拉的函数。
In my case, the m
(modulo) is the size of the hash type - 2^32
, 2^64
, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m)
. But think about that - m == 2 ^ 64
so, that gives us the guarantee that m
will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.
在我的例子中,m(modulo)是散列类型的大小 - 2 ^ 32,2 ^ 64等(在我的情况下是64位)。嗯,这意味着,我们应该只找到φ(m)的值。但想想 - m == 2 ^ 64所以,这给了我们保证m将与所有奇数相互作用,并且不会是任何偶数的互质。所以,我们需要做的是获取所有值的数量并将它们除以2。
Also, we know that m
will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:
另外,我们知道m将是未签名的,否则我们会遇到一些问题。比这更让我们有机会这样做:
hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );
Well, about 64bit numbers, x
is really big number ( 19 digits: 9 223 372 036 854 775 807
), but fast_pow
is really fast and we could cache the reverse number, in case that we need for more than one query.
好吧,关于64位数字,x是真正的大数字(19位数:9 223 372 036 854 775 807),但fast_pow非常快,我们可以缓存反向数字,以防我们需要多个查询。
fast_pow
is a well-known algorithm:
fast_pow是一个众所周知的算法:
hash_t fast_pow( hash_t source, hash_t pow )
{
if( 0 == pow )
{
return 1;
}
if( 0 != pow % 2 )
{
return source * fast_pow( source, pow - 1 );
}
else
{
return fast_pow( source * source, pow / 2 );
}
}
Addition: for example:
增加:例如:
hash_t base = 2305843009213693951; // 9th mersenne prime
hash_t x = 1234567890987654321;
x *= fast_pow( base, 123456789 ); // x * ( base ^ 123456789 )
hash_t y = -1;
y /= 2;
hash_t base_reverse = fast_pow( base, y );
x *= fast_pow( base_reverse, 123456789 ); // x * ( base_reverse ^ 123456789 )
assert( x == 1234567890987654321 ) ;
works perfect and very fast.
工作完美,非常快。
#3
1
You should use unsigned integers to get defined overflow behaviour (modulo 2^N). Signed integer overflow is undefined.
您应该使用无符号整数来获得定义的溢出行为(模2 ^ N)。有符号整数溢出未定义。
Also, instead of dividing you should multiply by the multiplicative inverse of p modulo the appropriate value. For example, if p=3 and your hash values are 8 bits, multiply by 171 because 171*3=513=2*256+1. The multiplicative inverse exists if p and the modulo value are relatively prime.
此外,不是划分你应该乘以p的乘法逆与模相应的值。例如,如果p = 3并且您的哈希值是8位,则乘以171,因为171 * 3 = 513 = 2 * 256 + 1。如果p和模数值是相对素数,则存在乘法逆。
#4
1
Just a partial side-answer here: i believe it is not strictly necessary to use unsigned integers. You can use one's complement.
这里只是一个部分的答案:我认为使用无符号整数并不是绝对必要的。你可以使用一个补码。
But note, that this will have a separate representation for -0 and +0, and that you'll probably have to handcode arithmetic operations along the way.
但请注意,这将为-0和+0提供单独的表示,并且您可能需要手动编码算术运算。
Some of the processor instructions are agnostic of the integer representation but not all.
一些处理器指令不知道整数表示,但不是全部。
#5
1
You have a * b = c mod 2^32 (or mod something else depending on how you are doing your hash). If you could find d such that b * d = 1 mod 2^32 (or mod whatever) then you could compute a * b * d = a and retrieve a. If gcd(b, mod 2^32) = 1 then you can use the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm to find x and y such that b * x + 2^32 * y = 1, or b * x = 1 - y * 2^32, or b * x = 1 mod 2^32, so x is the number you want to multiply by.
你有一个* b = c mod 2 ^ 32(或者根据你的哈希方式修改其他东西)。如果你能找到d这样b * d = 1 mod 2 ^ 32(或mod什么)那么你可以计算a * b * d = a并检索a。如果gcd(b,mod 2 ^ 32)= 1,那么您可以使用http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm查找x和y,使得b * x + 2 ^ 32 * y = 1,或者b * x = 1 - y * 2 ^ 32,或b * x = 1 mod 2 ^ 32,因此x是您要乘以的数字。
#6
0
So overflow is actually just your compiler being nice to you; the C/++ standard actually suggests that overflowing is undefined behaviour. So once you've overflown, there's actually nothing you can do because your program ceases to be deterministic.
所以溢出实际上只是你的编译器对你很好; C / ++标准实际上表明溢出是未定义的行为。所以一旦你溢出,实际上你无能为力,因为你的程序不再是确定性的。
You might need to rethink the algorithm, or tack on modulo operations / subtractions to fix your algorithm.
您可能需要重新考虑算法,或者使用模运算/减法来修复算法。
#1
5
Don't know about the overflow part, but there is a way of getting back the original value.
不知道溢出部分,但有一种方法可以取回原始值。
The Chinese Remainder Theorem help a great deal. Let's call h = abcd - cd
. G is the value, h
, without overflows, G = h + k*2^32
, assuming the overflow simply does %2^32
. And thus ab = G / p^2
.
中国剩余定理有很大帮助。我们叫h = abcd - cd。 G是没有溢出的值,h,G = h + k * 2 ^ 32,假设溢出只是%2 ^ 32。因此ab = G / p ^ 2。
G = h (mod 2^32)
G = 0 (mod p^2)
If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us
如果p ^ 2和2 ^ 32是互质的。关于中国剩余定理的这个页面给了我们
G = h * b * p^2 (mod 2^32 * p^2)
Where b
is modular multiplicative inverse of p^2 modulo 2^32, b * p^2 = 1 (mod 2^32)
. After you calculate G
, simply divide by p^2
to find ab
.
其中b是p ^ 2模2 ^ 32的模乘法逆,b * p ^ 2 = 1(mod 2 ^ 32)。计算G后,只需除以p ^ 2即可找到ab。
I hope I didn't make any mistakes...
我希望我没有犯任何错误......
#2
3
Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.
扩展欧几里德算法是一个很好的解决方案,但它太复杂,难以实现。有一个更好的。
And there's another way to do this (thanks to e friend of mine (: )
还有另一种方法可以做到这一点(感谢我的朋友(:)
There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m
and a
are coprime:
在*中有一篇很好的文章 - 在这种情况下使用欧拉定理的模乘法逆,当m和a是互质时:
where φ(m)
is Euler's totient function.
其中φ(m)是欧拉的函数。
In my case, the m
(modulo) is the size of the hash type - 2^32
, 2^64
, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m)
. But think about that - m == 2 ^ 64
so, that gives us the guarantee that m
will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.
在我的例子中,m(modulo)是散列类型的大小 - 2 ^ 32,2 ^ 64等(在我的情况下是64位)。嗯,这意味着,我们应该只找到φ(m)的值。但想想 - m == 2 ^ 64所以,这给了我们保证m将与所有奇数相互作用,并且不会是任何偶数的互质。所以,我们需要做的是获取所有值的数量并将它们除以2。
Also, we know that m
will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:
另外,我们知道m将是未签名的,否则我们会遇到一些问题。比这更让我们有机会这样做:
hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );
Well, about 64bit numbers, x
is really big number ( 19 digits: 9 223 372 036 854 775 807
), but fast_pow
is really fast and we could cache the reverse number, in case that we need for more than one query.
好吧,关于64位数字,x是真正的大数字(19位数:9 223 372 036 854 775 807),但fast_pow非常快,我们可以缓存反向数字,以防我们需要多个查询。
fast_pow
is a well-known algorithm:
fast_pow是一个众所周知的算法:
hash_t fast_pow( hash_t source, hash_t pow )
{
if( 0 == pow )
{
return 1;
}
if( 0 != pow % 2 )
{
return source * fast_pow( source, pow - 1 );
}
else
{
return fast_pow( source * source, pow / 2 );
}
}
Addition: for example:
增加:例如:
hash_t base = 2305843009213693951; // 9th mersenne prime
hash_t x = 1234567890987654321;
x *= fast_pow( base, 123456789 ); // x * ( base ^ 123456789 )
hash_t y = -1;
y /= 2;
hash_t base_reverse = fast_pow( base, y );
x *= fast_pow( base_reverse, 123456789 ); // x * ( base_reverse ^ 123456789 )
assert( x == 1234567890987654321 ) ;
works perfect and very fast.
工作完美,非常快。
#3
1
You should use unsigned integers to get defined overflow behaviour (modulo 2^N). Signed integer overflow is undefined.
您应该使用无符号整数来获得定义的溢出行为(模2 ^ N)。有符号整数溢出未定义。
Also, instead of dividing you should multiply by the multiplicative inverse of p modulo the appropriate value. For example, if p=3 and your hash values are 8 bits, multiply by 171 because 171*3=513=2*256+1. The multiplicative inverse exists if p and the modulo value are relatively prime.
此外,不是划分你应该乘以p的乘法逆与模相应的值。例如,如果p = 3并且您的哈希值是8位,则乘以171,因为171 * 3 = 513 = 2 * 256 + 1。如果p和模数值是相对素数,则存在乘法逆。
#4
1
Just a partial side-answer here: i believe it is not strictly necessary to use unsigned integers. You can use one's complement.
这里只是一个部分的答案:我认为使用无符号整数并不是绝对必要的。你可以使用一个补码。
But note, that this will have a separate representation for -0 and +0, and that you'll probably have to handcode arithmetic operations along the way.
但请注意,这将为-0和+0提供单独的表示,并且您可能需要手动编码算术运算。
Some of the processor instructions are agnostic of the integer representation but not all.
一些处理器指令不知道整数表示,但不是全部。
#5
1
You have a * b = c mod 2^32 (or mod something else depending on how you are doing your hash). If you could find d such that b * d = 1 mod 2^32 (or mod whatever) then you could compute a * b * d = a and retrieve a. If gcd(b, mod 2^32) = 1 then you can use the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm to find x and y such that b * x + 2^32 * y = 1, or b * x = 1 - y * 2^32, or b * x = 1 mod 2^32, so x is the number you want to multiply by.
你有一个* b = c mod 2 ^ 32(或者根据你的哈希方式修改其他东西)。如果你能找到d这样b * d = 1 mod 2 ^ 32(或mod什么)那么你可以计算a * b * d = a并检索a。如果gcd(b,mod 2 ^ 32)= 1,那么您可以使用http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm查找x和y,使得b * x + 2 ^ 32 * y = 1,或者b * x = 1 - y * 2 ^ 32,或b * x = 1 mod 2 ^ 32,因此x是您要乘以的数字。
#6
0
So overflow is actually just your compiler being nice to you; the C/++ standard actually suggests that overflowing is undefined behaviour. So once you've overflown, there's actually nothing you can do because your program ceases to be deterministic.
所以溢出实际上只是你的编译器对你很好; C / ++标准实际上表明溢出是未定义的行为。所以一旦你溢出,实际上你无能为力,因为你的程序不再是确定性的。
You might need to rethink the algorithm, or tack on modulo operations / subtractions to fix your algorithm.
您可能需要重新考虑算法,或者使用模运算/减法来修复算法。