用一种独特的、确定的方式将两个整数映射到一个。

时间:2021-01-03 12:47:24

Imagine two positive integers A and B. I want to combine these two into a single integer C.

假设两个正整数A和b,我想把它们合并成一个整数C。

There can be no other integers D and E which combine to C. So combining them with the addition operator doesn't work. Eg 30 + 10 = 40 = 40 + 0 = 39 + 1 Neither does concatination work. Eg "31" + "2" = 312 = "3" + "12"

没有其他的整数D和E可以合并到c,所以把它们和加法运算符组合起来是行不通的。30 + 10 = 40 = 40 + 0 = 39 + 1,这两项都不适用。“31”+“2”= 312 =“3”+“12”

This combination operation should also be deterministic (always yield the same result with the same inputs) and should always yield an integer on either the positive or the negative side of integers.

这种组合操作也应该是确定性的(总是与相同的输入产生相同的结果),并且应该总是在整数的正或负方面产生整数。

14 个解决方案

#1


183  

You're looking for a bijective NxN -> N mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to so-called pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:

你在寻找一个双射NxN -> N映射。这些是用来做燕尾服的。看一下这个PDF,以介绍所谓的配对函数。Wikipedia引入了一个特定的配对函数,即Cantor配对函数:

用一种独特的、确定的方式将两个整数映射到一个。

Three remarks:

备注:三个

  • As others have made clear, if you plan to implement a pairing function, you may soon find you need arbitrarily large integers (bignums).
  • 正如其他人已经明确的,如果您计划实现一个配对函数,您可能很快就会发现您需要任意大的整数(bignums)。
  • If you don't want to make a distinction between the pairs (a, b) and (b, a), then sort a and b before applying the pairing function.
  • 如果你不想对成对(a, b)和(b, a)进行区分,那么在应用配对函数之前对a和b进行排序。
  • Actually I lied. You are looking for a bijective ZxZ -> N mapping. Cantor's function only works on non-negative numbers. This is not a problem however, because it's easy to define a bijection f : Z -> N, like so:
    • f(n) = n * 2 if n >= 0
    • f(n) = n * 2,如果n >= 0。
    • f(n) = -n * 2 - 1 if n < 0
    • f(n) = -n * 2 - 1如果n < 0。
  • 其实我说谎了。您正在寻找一个双射ZxZ -> N映射。康托尔的功能只适用于非负数。这不是一个问题,因为很容易定义一个bijection f: Z -> N,就像这样:f(N) = N * 2如果N >= 0 f(N) = - N * 2 - 1,如果N < 0。

#2


169  

Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn't always stay within the limits of a 2N bit integer if the inputs are two N bit integers. That is, if my inputs are two 16 bit integers ranging from 0 to 2^16 -1, then there are 2^16 * (2^16 -1) combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least 2^16 * (2^16 -1), which is equal to 2^32 - 2^16, or in other words, a map of 32 bit numbers should be feasible ideally. This may not be of little practical importance in programming world.

Cantor配对函数实际上是一个比较好的函数,考虑它的简单、快速和空间效率,但是在Wolfram上有一个更好的东西由Matthew Szudzik在这里发表。Cantor配对函数(相对而言)的限制是,如果输入是两个N位整数,编码结果的范围并不总是停留在2N位整数的范围内。如果我输入两个16位整数从0到2 ^ 16 1,然后是2 ^ 16 *(2 ^ 16 1)组合的输入,通过明显的鸽子洞原理,至少我们需要输出的大小2 ^ 16 *(2 ^ 16 1),等于2 ^ 32 - 2 ^ 16,换句话说,一个32位的数字地图应该可行的理想。这在编程世界中可能并不具有实际的重要性。

Cantor pairing function:

康托尔配对功能:

(a + b) * (a + b + 1) / 2 + a; where a, b >= 0

The mapping for two maximum most 16 bit integers (65535, 65535) will be 8589803520 which as you see cannot be fit into 32 bits.

两个最大16位整数(65535,65535)的映射将是8589803520,您所看到的是不适合32位的。

Enter Szudzik's function:

进入Szudzik的功能:

a >= b ? a * a + a + b : a + b * b;  where a, b >= 0

The mapping for (65535, 65535) will now be 4294967295 which as you see is a 32 bit (0 to 2^32 -1) integer. This is where this solution is ideal, it simply utilizes every single point in that space, so nothing can get more space efficient.

的映射(65535、65535)现在将4294967295当你看到的是一个32位整数(0到2 ^ 32 1)。这就是这个解决方案的理想之处,它简单地利用了空间中的每一个点,所以没有任何东西可以获得更多的空间效率。


Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let's consider signed 16 bit integers ranging from -(2^15) to 2^15 -1 (later we'll see how to extend even the ouput to span over signed range). Since a and b have to be positive they range from 0 to 2^15 - 1.

现在考虑到我们通常处理不同大小的数字签名实现在语言/框架,让我们考虑签署了16位整数,范围从15 -(2 ^ 15)2 ^ 1(后来我们将看到如何扩展甚至输出跨度超过签署范围)。由于a和b必须积极的范围从0到2 ^ 15 - 1。

Cantor pairing function:

康托尔配对功能:

The mapping for two maximum most 16 bit signed integers (32767, 32767) will be 2147418112 which is just short of maximum value for signed 32 bit integer.

两个最大16位有符号整数的映射(32767,32767)将是2147418112,它的最大值是32位整数的最大值。

Now Szudzik's function:

现在Szudzik功能:

(32767, 32767) => 1073741823, much smaller..

(32767,32767)=> 1073741823,小得多。

Let's account for negative integers. That's beyond the original question I know, but just elaborating to help future visitors.

我们来考虑负整数。这超出了我所知道的最初的问题,只是为了帮助未来的访问者。

Cantor pairing function:

康托尔配对功能:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
(A + B) * (A + B + 1) / 2 + A;

(-32768, -32768) => 8589803520 which is Int64. 64 bit output for 16 bit inputs may be so unpardonable!!

(-32768,-32768)=> 8589803520,是Int64。16位输入的位输出可能是不可原谅的!!

Szudzik's function:

Szudzik的功能:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;

(-32768, -32768) => 4294967295 which is 32 bit for unsigned range or 64 bit for signed range, but still better.

(-32768,-32768)=> 4294967295,这是32位的无符号范围或64位的签名范围,但还是更好。

Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik's:

现在所有这些,输出一直都是正的。在有符号的世界里,如果我们能将一半的输出传输到负轴,那么将会节省更多的空间。你可以这样做Szudzik的:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
C = (A >= B ? A * A + A + B : A + B * B) / 2;
a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;

(-32768, 32767) => -2147483648

(32767, -32768) => -2147450880

(0, 0) => 0 

(32767, 32767) => 2147418112

(-32768, -32768) => 2147483647

What I do: After applying a weight of 2 to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by -1.

我要做的是:在将2的权重应用到输入并经过这个函数之后,我将输出的输出除以2,然后将它们中的一些乘上-1。

See the results, for any input in the range of a signed 16 bit number, the output lies within the limits of a signed 32 bit integer which is cool. I'm not sure how to go about the same way for Cantor pairing function but didn't try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.

查看结果,对于已签名的16位数字的范围内的任何输入,输出位于一个已签名的32位整数的范围内,这是很酷的。我不知道如何用同样的方法去做康托配对的功能,但不像它那样有效。此外,对Cantor配对函数的更多计算也意味着它的速度也较慢。

Here is a C# implementation.

这是一个c#实现。

public static long PerfectlyHashThem(int a, int b)
{
    var A = (ulong)(a >= 0 ? 2 * (long)a : -2 * (long)a - 1);
    var B = (ulong)(b >= 0 ? 2 * (long)b : -2 * (long)b - 1);
    var C = (long)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

public static int PerfectlyHashThem(short a, short b)
{
    var A = (uint)(a >= 0 ? 2 * a : -2 * a - 1);
    var B = (uint)(b >= 0 ? 2 * b : -2 * b - 1);
    var C = (int)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

Since the intermediate calculations can exceed limits of 2N signed integer, I have used 4N integer type (the last division by 2 brings back the result to 2N).

由于中间的计算可以超过2N的有符号整数的限制,所以我使用了4N整数类型(最后的除法2将结果返回到2N)。

The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!

我在备选解决方案中提供的链接很好地描述了利用空间中的每一个点的函数图。它的神奇之处在于,你可以唯一地将一对坐标编码成一个数字可逆!魔法世界的数字! !

#3


41  

If A and B can be expressed with 2 bytes, you can combine them on 4 bytes. Put A on the most significant half and B on the least significant half.

如果A和B可以用2个字节表示,您可以将它们组合在4个字节上。把A放在最重要的一半,B在最不重要的一半。

In C language this gives (assuming sizeof(short)=2 and sizeof(int)=4):

在C语言中,这表示(假设sizeof(short)=2, sizeof(int)=4):

int combine(short A, short B)
{
    return A<<16 | B;
}

short getA(int C)
{
    return C>>16;
}

short getB(int C)
{
    return C & 0xFFFF;
}

#4


13  

Is this even possible?
You are combining two integers. They both have the range -2,147,483,648 to 2,147,483,647 but you will only take the positives. That makes 2147483647^2 = 4,61169E+18 combinations. Since each combination has to be unique AND result in an integer, you'll need some kind of magical integer that can contain this amount of numbers.

这是可能吗?你将两个整数结合起来。他们都有-2,147,483,648到2,147,483,647,但你只会接受积极的。这使得2147483647 ^ 2 = 4,2147483647 e + 18组合。因为每个组合都必须是唯一的,并且结果是整数,所以您需要某种神奇的整数,它可以包含这个数量的数字。

Or is my logic flawed?

还是我的逻辑有缺陷?

#5


7  

The standard mathematical way for positive integers is to use the uniqueness of prime factorization.

正整数的标准数学方法是利用质因数分解的唯一性。

f( x, y ) -> 2^x * 3^y

The downside is that the image tends to span quite a large range of integers so when it comes to expressing the mapping in a computer algorithm you may have issues with choosing an appropriate type for the result.

缺点是,图像往往会跨越大量的整数,所以当涉及到在计算机算法中表示映射时,您可能会遇到选择合适的类型的问题。

You could modify this to deal with negative x and y by encoding a flags with powers of 5 and 7 terms.

您可以通过对带有5和7项功能的标志进行编码来修改这个以处理负x和y。

e.g.

如。

f( x, y ) -> 2^|x| * 3^|y| * 5^(x<0) * 7^(y<0)

#6


7  

Let number a be the first, b the second. Let p be the a+1-th prime number, q be the b+1-th prime number

让a第一个,b第二个。p是a+1的素数,q是b+1-th的素数。

Then, the result is pq, if a<b, or 2pq if a>b. If a=b, let it be p^2.

然后,结果是pq,如果a b。如果a = b,顺其自然p ^ 2。 ,或2pq,如果是>

#7


4  

f(a, b) = s(a+b) + a, where s(n) = n*(n+1)/2

f(a, b) = s(a+b) + a,其中s(n) = n*(n+1)/2。

  • This is a function -- it is deterministic.
  • 这是一个函数,它是确定性的。
  • It is also injective -- f maps different values for different (a,b) pairs. You can prove this using the fact: s(a+b+1)-s(a+b) = a+b+1 < a.
  • 它也是单射的——f映射不同的(a,b)对的不同值。你可以用事实证明:s(a+b+1)-s(a+b) = a+b+1 < a。
  • It returns quite small values -- good if your are going to use it for array indexing, as the array does not have to be big.
  • 它返回非常小的值——如果你要用它来进行数组索引,这很好,因为数组不需要很大。
  • It is cache-friendly -- if two (a, b) pairs are close to each other, then f maps numbers to them which are close to each other (compared to other methods).
  • 它是缓存友好的——如果两个(a, b)对彼此很接近,那么f将数字映射到它们之间(与其他方法相比)。

I did not understand what You mean by:

我不明白你的意思:

should always yield an integer on either the positive or the negative side of integers

是否应该总是在整数的正或负方面产生整数?

How can I write (greater than), (less than) characters in this forum?

在这个论坛里,我怎么写(比)(少于)字符?

#8


4  

For positive integers as arguments and where argument order doesn't matter:

对于正整数作为参数,参数顺序无关紧要:

  1. Here's an unordered pairing function:

    这是一个无序的配对函数:

    <x, y> = x * y + trunc((|x - y| - 1)^2 / 4) = <y, x>
    
  2. For x ≠ y, here's a unique unordered pairing function:

    x≠y,这里有一个独特的无序配对函数:

    <x, y> = if x < y:
               x * (y - 1) + trunc((y - x - 2)^2 / 4)
             if x > y:
               (x - 1) * y + trunc((x - y - 2)^2 / 4)
           = <y, x>
    

#9


3  

Although Stephan202's answer is the only truly general one, for integers in a bounded range you can do better. For example, if your range is 0..10,000, then you can do:

尽管Stephan202的答案是唯一真正通用的答案,对于有界范围内的整数,你可以做得更好。例如,如果您的范围是0..1万,然后你可以:

#define RANGE_MIN 0
#define RANGE_MAX 10000

unsigned int merge(unsigned int x, unsigned int y)
{
    return (x * (RANGE_MAX - RANGE_MIN + 1)) + y;
}

void split(unsigned int v, unsigned int &x, unsigned int &y)
{
    x = RANGE_MIN + (v / (RANGE_MAX - RANGE_MIN + 1));
    y = RANGE_MIN + (v % (RANGE_MAX - RANGE_MIN + 1));
}

Results can fit in a single integer for a range up to the square root of the integer type's cardinality. This packs slightly more efficiently than Stephan202's more general method. It is also considerably simpler to decode; requiring no square roots, for starters :)

结果可以在整数类型的基数的平方根上匹配一个整数。这个包比Stephan202的一般方法更有效。解码也相当简单;首先,不需要平方根:)

#10


2  

Check this: http://en.wikipedia.org/wiki/Pigeonhole_principle. If A, B and C are of same type, it cannot be done. If A and B are 16-bit integers, and C is 32-bit, then you can simply use shifting.

看看这个:http://en.wikipedia.org/wiki/Pigeonhole_principle。如果A、B和C属于同一类型,则无法完成。如果A和B是16位整数,而C是32位,那么你可以简单地使用移位。

The very nature of hashing algorithms is that they cannot provide a unique hash for each different input.

散列算法的本质是它们不能为每个不同的输入提供唯一的散列。

#11


2  

It isn't that tough to construct a mapping:

构造一个映射并不难:

   1  2  3  4  5  use this mapping if (a,b) != (b,a)
1  0  1  3  6 10
2  2  4  7 11 16
3  5  8 12 17 23
4  9 13 18 24 31
5 14 19 25 32 40

   1  2  3  4  5 use this mapping if (a,b) == (b,a) (mirror)
1  0  1  2  4  6
2  1  3  5  7 10
3  2  5  8 11 14
4  4  8 11 15 19
5  6 10 14 19 24


    0  1 -1  2 -2 use this if you need negative/positive
 0  0  1  2  4  6
 1  1  3  5  7 10
-1  2  5  8 11 14
 2  4  8 11 15 19
-2  6 10 14 19 24

Figuring out how to get the value for an arbitrary a,b is a little more difficult.

计算如何得到任意a的值,b有点难。

#12


2  

Here is an extension of @DoctorJ 's code to unbounded integers based on the method given by @nawfal. It can encode and decode. It works with normal arrays and numpy arrays.

这里是@DoctorJ 's代码的扩展,基于@nawfal给出的方法。它可以编码和解码。它与普通数组和numpy数组一起工作。

#!/usr/bin/env python
from numbers import Integral    

def tuple_to_int(tup):
    """:Return: the unique non-negative integer encoding of a tuple of non-negative integers."""
    if len(tup) == 0:  # normally do if not tup, but doesn't work with np
        raise ValueError('Cannot encode empty tuple')
    if len(tup) == 1:
        x = tup[0]
        if not isinstance(x, Integral):
            raise ValueError('Can only encode integers')
        return x
    elif len(tup) == 2:
        # print("len=2")
        x, y = tuple_to_int(tup[0:1]), tuple_to_int(tup[1:2])  # Just to validate x and y

        X = 2 * x if x >= 0 else -2 * x - 1  # map x to positive integers
        Y = 2 * y if y >= 0 else -2 * y - 1  # map y to positive integers
        Z = (X * X + X + Y) if X >= Y else (X + Y * Y)  # encode

        # Map evens onto positives
        if (x >= 0 and y >= 0):
            return Z // 2
        elif (x < 0 and y >= 0 and X >= Y):
            return Z // 2
        elif (x < 0 and y < 0 and X < Y):
            return Z // 2
        # Map odds onto negative
        else:
            return (-Z - 1) // 2
    else:
        return tuple_to_int((tuple_to_int(tup[:2]),) + tuple(tup[2:]))  # ***speed up tuple(tup[2:])?***


def int_to_tuple(num, size=2):
    """:Return: the unique tuple of length `size` that encodes to `num`."""
    if not isinstance(num, Integral):
        raise ValueError('Can only encode integers (got {})'.format(num))
    if not isinstance(size, Integral) or size < 1:
        raise ValueError('Tuple is the wrong size ({})'.format(size))
    if size == 1:
        return (num,)
    elif size == 2:

        # Mapping onto positive integers
        Z = -2 * num - 1 if num < 0 else 2 * num

        # Reversing Pairing
        s = isqrt(Z)
        if Z - s * s < s:
            X, Y = Z - s * s, s
        else:
            X, Y = s, Z - s * s - s

        # Undoing mappint to positive integers
        x = (X + 1) // -2 if X % 2 else X // 2  # True if X not divisible by 2
        y = (Y + 1) // -2 if Y % 2 else Y // 2  # True if Y not divisible by 2

        return x, y

    else:
        x, y = int_to_tuple(num, 2)
        return int_to_tuple(x, size - 1) + (y,)


def isqrt(n):
    """":Return: the largest integer x for which x * x does not exceed n."""
    # Newton's method, via http://*.com/a/15391420
    x = n
    y = (x + 1) // 2
    while y < x:
        x = y
        y = (x + n // x) // 2
    return x

#13


1  

What you suggest is impossible. You will always have collisions.

你的建议是不可能的。你总是会有碰撞。

In order to map two objects to another single set, the mapped set must have a minimum size of the number of combinations expected:

为了将两个对象映射到另一个集合,映射集必须具有期望的组合数量的最小值:

Assuming a 32-bit integer, you have 2147483647 positive integers. Choosing two of these where order doesn't matter and with repetition yields 2305843008139952128 combinations. This does not fit nicely in the set of 32-bit integers.

假设一个32位整数,就有2147483647个正整数。选择其中两个顺序无关紧要,重复产生2305843008139952128组合。这在32位整数的集合中不太合适。

You can, however fit this mapping in 61 bits. Using a 64-bit integer is probably easiest. Set the high word to the smaller integer and the low word to the larger one.

不过,你可以用61位来匹配这个映射。使用64位整数可能是最简单的。将高的字设置为较小的整数,将低的字设置为较大的整数。

#14


-1  

let us have two number B and C , encoding them into single number A

我们有两个B和C,把它们编码成一个A。

A = B + C * N

A = B + C * N。

where

在哪里

B=A % N = B

B=A % N = B。

C=A / N = C

C=A / N = C。

#1


183  

You're looking for a bijective NxN -> N mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to so-called pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:

你在寻找一个双射NxN -> N映射。这些是用来做燕尾服的。看一下这个PDF,以介绍所谓的配对函数。Wikipedia引入了一个特定的配对函数,即Cantor配对函数:

用一种独特的、确定的方式将两个整数映射到一个。

Three remarks:

备注:三个

  • As others have made clear, if you plan to implement a pairing function, you may soon find you need arbitrarily large integers (bignums).
  • 正如其他人已经明确的,如果您计划实现一个配对函数,您可能很快就会发现您需要任意大的整数(bignums)。
  • If you don't want to make a distinction between the pairs (a, b) and (b, a), then sort a and b before applying the pairing function.
  • 如果你不想对成对(a, b)和(b, a)进行区分,那么在应用配对函数之前对a和b进行排序。
  • Actually I lied. You are looking for a bijective ZxZ -> N mapping. Cantor's function only works on non-negative numbers. This is not a problem however, because it's easy to define a bijection f : Z -> N, like so:
    • f(n) = n * 2 if n >= 0
    • f(n) = n * 2,如果n >= 0。
    • f(n) = -n * 2 - 1 if n < 0
    • f(n) = -n * 2 - 1如果n < 0。
  • 其实我说谎了。您正在寻找一个双射ZxZ -> N映射。康托尔的功能只适用于非负数。这不是一个问题,因为很容易定义一个bijection f: Z -> N,就像这样:f(N) = N * 2如果N >= 0 f(N) = - N * 2 - 1,如果N < 0。

#2


169  

Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn't always stay within the limits of a 2N bit integer if the inputs are two N bit integers. That is, if my inputs are two 16 bit integers ranging from 0 to 2^16 -1, then there are 2^16 * (2^16 -1) combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least 2^16 * (2^16 -1), which is equal to 2^32 - 2^16, or in other words, a map of 32 bit numbers should be feasible ideally. This may not be of little practical importance in programming world.

Cantor配对函数实际上是一个比较好的函数,考虑它的简单、快速和空间效率,但是在Wolfram上有一个更好的东西由Matthew Szudzik在这里发表。Cantor配对函数(相对而言)的限制是,如果输入是两个N位整数,编码结果的范围并不总是停留在2N位整数的范围内。如果我输入两个16位整数从0到2 ^ 16 1,然后是2 ^ 16 *(2 ^ 16 1)组合的输入,通过明显的鸽子洞原理,至少我们需要输出的大小2 ^ 16 *(2 ^ 16 1),等于2 ^ 32 - 2 ^ 16,换句话说,一个32位的数字地图应该可行的理想。这在编程世界中可能并不具有实际的重要性。

Cantor pairing function:

康托尔配对功能:

(a + b) * (a + b + 1) / 2 + a; where a, b >= 0

The mapping for two maximum most 16 bit integers (65535, 65535) will be 8589803520 which as you see cannot be fit into 32 bits.

两个最大16位整数(65535,65535)的映射将是8589803520,您所看到的是不适合32位的。

Enter Szudzik's function:

进入Szudzik的功能:

a >= b ? a * a + a + b : a + b * b;  where a, b >= 0

The mapping for (65535, 65535) will now be 4294967295 which as you see is a 32 bit (0 to 2^32 -1) integer. This is where this solution is ideal, it simply utilizes every single point in that space, so nothing can get more space efficient.

的映射(65535、65535)现在将4294967295当你看到的是一个32位整数(0到2 ^ 32 1)。这就是这个解决方案的理想之处,它简单地利用了空间中的每一个点,所以没有任何东西可以获得更多的空间效率。


Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let's consider signed 16 bit integers ranging from -(2^15) to 2^15 -1 (later we'll see how to extend even the ouput to span over signed range). Since a and b have to be positive they range from 0 to 2^15 - 1.

现在考虑到我们通常处理不同大小的数字签名实现在语言/框架,让我们考虑签署了16位整数,范围从15 -(2 ^ 15)2 ^ 1(后来我们将看到如何扩展甚至输出跨度超过签署范围)。由于a和b必须积极的范围从0到2 ^ 15 - 1。

Cantor pairing function:

康托尔配对功能:

The mapping for two maximum most 16 bit signed integers (32767, 32767) will be 2147418112 which is just short of maximum value for signed 32 bit integer.

两个最大16位有符号整数的映射(32767,32767)将是2147418112,它的最大值是32位整数的最大值。

Now Szudzik's function:

现在Szudzik功能:

(32767, 32767) => 1073741823, much smaller..

(32767,32767)=> 1073741823,小得多。

Let's account for negative integers. That's beyond the original question I know, but just elaborating to help future visitors.

我们来考虑负整数。这超出了我所知道的最初的问题,只是为了帮助未来的访问者。

Cantor pairing function:

康托尔配对功能:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
(A + B) * (A + B + 1) / 2 + A;

(-32768, -32768) => 8589803520 which is Int64. 64 bit output for 16 bit inputs may be so unpardonable!!

(-32768,-32768)=> 8589803520,是Int64。16位输入的位输出可能是不可原谅的!!

Szudzik's function:

Szudzik的功能:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;

(-32768, -32768) => 4294967295 which is 32 bit for unsigned range or 64 bit for signed range, but still better.

(-32768,-32768)=> 4294967295,这是32位的无符号范围或64位的签名范围,但还是更好。

Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik's:

现在所有这些,输出一直都是正的。在有符号的世界里,如果我们能将一半的输出传输到负轴,那么将会节省更多的空间。你可以这样做Szudzik的:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
C = (A >= B ? A * A + A + B : A + B * B) / 2;
a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;

(-32768, 32767) => -2147483648

(32767, -32768) => -2147450880

(0, 0) => 0 

(32767, 32767) => 2147418112

(-32768, -32768) => 2147483647

What I do: After applying a weight of 2 to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by -1.

我要做的是:在将2的权重应用到输入并经过这个函数之后,我将输出的输出除以2,然后将它们中的一些乘上-1。

See the results, for any input in the range of a signed 16 bit number, the output lies within the limits of a signed 32 bit integer which is cool. I'm not sure how to go about the same way for Cantor pairing function but didn't try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.

查看结果,对于已签名的16位数字的范围内的任何输入,输出位于一个已签名的32位整数的范围内,这是很酷的。我不知道如何用同样的方法去做康托配对的功能,但不像它那样有效。此外,对Cantor配对函数的更多计算也意味着它的速度也较慢。

Here is a C# implementation.

这是一个c#实现。

public static long PerfectlyHashThem(int a, int b)
{
    var A = (ulong)(a >= 0 ? 2 * (long)a : -2 * (long)a - 1);
    var B = (ulong)(b >= 0 ? 2 * (long)b : -2 * (long)b - 1);
    var C = (long)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

public static int PerfectlyHashThem(short a, short b)
{
    var A = (uint)(a >= 0 ? 2 * a : -2 * a - 1);
    var B = (uint)(b >= 0 ? 2 * b : -2 * b - 1);
    var C = (int)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

Since the intermediate calculations can exceed limits of 2N signed integer, I have used 4N integer type (the last division by 2 brings back the result to 2N).

由于中间的计算可以超过2N的有符号整数的限制,所以我使用了4N整数类型(最后的除法2将结果返回到2N)。

The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!

我在备选解决方案中提供的链接很好地描述了利用空间中的每一个点的函数图。它的神奇之处在于,你可以唯一地将一对坐标编码成一个数字可逆!魔法世界的数字! !

#3


41  

If A and B can be expressed with 2 bytes, you can combine them on 4 bytes. Put A on the most significant half and B on the least significant half.

如果A和B可以用2个字节表示,您可以将它们组合在4个字节上。把A放在最重要的一半,B在最不重要的一半。

In C language this gives (assuming sizeof(short)=2 and sizeof(int)=4):

在C语言中,这表示(假设sizeof(short)=2, sizeof(int)=4):

int combine(short A, short B)
{
    return A<<16 | B;
}

short getA(int C)
{
    return C>>16;
}

short getB(int C)
{
    return C & 0xFFFF;
}

#4


13  

Is this even possible?
You are combining two integers. They both have the range -2,147,483,648 to 2,147,483,647 but you will only take the positives. That makes 2147483647^2 = 4,61169E+18 combinations. Since each combination has to be unique AND result in an integer, you'll need some kind of magical integer that can contain this amount of numbers.

这是可能吗?你将两个整数结合起来。他们都有-2,147,483,648到2,147,483,647,但你只会接受积极的。这使得2147483647 ^ 2 = 4,2147483647 e + 18组合。因为每个组合都必须是唯一的,并且结果是整数,所以您需要某种神奇的整数,它可以包含这个数量的数字。

Or is my logic flawed?

还是我的逻辑有缺陷?

#5


7  

The standard mathematical way for positive integers is to use the uniqueness of prime factorization.

正整数的标准数学方法是利用质因数分解的唯一性。

f( x, y ) -> 2^x * 3^y

The downside is that the image tends to span quite a large range of integers so when it comes to expressing the mapping in a computer algorithm you may have issues with choosing an appropriate type for the result.

缺点是,图像往往会跨越大量的整数,所以当涉及到在计算机算法中表示映射时,您可能会遇到选择合适的类型的问题。

You could modify this to deal with negative x and y by encoding a flags with powers of 5 and 7 terms.

您可以通过对带有5和7项功能的标志进行编码来修改这个以处理负x和y。

e.g.

如。

f( x, y ) -> 2^|x| * 3^|y| * 5^(x<0) * 7^(y<0)

#6


7  

Let number a be the first, b the second. Let p be the a+1-th prime number, q be the b+1-th prime number

让a第一个,b第二个。p是a+1的素数,q是b+1-th的素数。

Then, the result is pq, if a<b, or 2pq if a>b. If a=b, let it be p^2.

然后,结果是pq,如果a b。如果a = b,顺其自然p ^ 2。 ,或2pq,如果是>

#7


4  

f(a, b) = s(a+b) + a, where s(n) = n*(n+1)/2

f(a, b) = s(a+b) + a,其中s(n) = n*(n+1)/2。

  • This is a function -- it is deterministic.
  • 这是一个函数,它是确定性的。
  • It is also injective -- f maps different values for different (a,b) pairs. You can prove this using the fact: s(a+b+1)-s(a+b) = a+b+1 < a.
  • 它也是单射的——f映射不同的(a,b)对的不同值。你可以用事实证明:s(a+b+1)-s(a+b) = a+b+1 < a。
  • It returns quite small values -- good if your are going to use it for array indexing, as the array does not have to be big.
  • 它返回非常小的值——如果你要用它来进行数组索引,这很好,因为数组不需要很大。
  • It is cache-friendly -- if two (a, b) pairs are close to each other, then f maps numbers to them which are close to each other (compared to other methods).
  • 它是缓存友好的——如果两个(a, b)对彼此很接近,那么f将数字映射到它们之间(与其他方法相比)。

I did not understand what You mean by:

我不明白你的意思:

should always yield an integer on either the positive or the negative side of integers

是否应该总是在整数的正或负方面产生整数?

How can I write (greater than), (less than) characters in this forum?

在这个论坛里,我怎么写(比)(少于)字符?

#8


4  

For positive integers as arguments and where argument order doesn't matter:

对于正整数作为参数,参数顺序无关紧要:

  1. Here's an unordered pairing function:

    这是一个无序的配对函数:

    <x, y> = x * y + trunc((|x - y| - 1)^2 / 4) = <y, x>
    
  2. For x ≠ y, here's a unique unordered pairing function:

    x≠y,这里有一个独特的无序配对函数:

    <x, y> = if x < y:
               x * (y - 1) + trunc((y - x - 2)^2 / 4)
             if x > y:
               (x - 1) * y + trunc((x - y - 2)^2 / 4)
           = <y, x>
    

#9


3  

Although Stephan202's answer is the only truly general one, for integers in a bounded range you can do better. For example, if your range is 0..10,000, then you can do:

尽管Stephan202的答案是唯一真正通用的答案,对于有界范围内的整数,你可以做得更好。例如,如果您的范围是0..1万,然后你可以:

#define RANGE_MIN 0
#define RANGE_MAX 10000

unsigned int merge(unsigned int x, unsigned int y)
{
    return (x * (RANGE_MAX - RANGE_MIN + 1)) + y;
}

void split(unsigned int v, unsigned int &x, unsigned int &y)
{
    x = RANGE_MIN + (v / (RANGE_MAX - RANGE_MIN + 1));
    y = RANGE_MIN + (v % (RANGE_MAX - RANGE_MIN + 1));
}

Results can fit in a single integer for a range up to the square root of the integer type's cardinality. This packs slightly more efficiently than Stephan202's more general method. It is also considerably simpler to decode; requiring no square roots, for starters :)

结果可以在整数类型的基数的平方根上匹配一个整数。这个包比Stephan202的一般方法更有效。解码也相当简单;首先,不需要平方根:)

#10


2  

Check this: http://en.wikipedia.org/wiki/Pigeonhole_principle. If A, B and C are of same type, it cannot be done. If A and B are 16-bit integers, and C is 32-bit, then you can simply use shifting.

看看这个:http://en.wikipedia.org/wiki/Pigeonhole_principle。如果A、B和C属于同一类型,则无法完成。如果A和B是16位整数,而C是32位,那么你可以简单地使用移位。

The very nature of hashing algorithms is that they cannot provide a unique hash for each different input.

散列算法的本质是它们不能为每个不同的输入提供唯一的散列。

#11


2  

It isn't that tough to construct a mapping:

构造一个映射并不难:

   1  2  3  4  5  use this mapping if (a,b) != (b,a)
1  0  1  3  6 10
2  2  4  7 11 16
3  5  8 12 17 23
4  9 13 18 24 31
5 14 19 25 32 40

   1  2  3  4  5 use this mapping if (a,b) == (b,a) (mirror)
1  0  1  2  4  6
2  1  3  5  7 10
3  2  5  8 11 14
4  4  8 11 15 19
5  6 10 14 19 24


    0  1 -1  2 -2 use this if you need negative/positive
 0  0  1  2  4  6
 1  1  3  5  7 10
-1  2  5  8 11 14
 2  4  8 11 15 19
-2  6 10 14 19 24

Figuring out how to get the value for an arbitrary a,b is a little more difficult.

计算如何得到任意a的值,b有点难。

#12


2  

Here is an extension of @DoctorJ 's code to unbounded integers based on the method given by @nawfal. It can encode and decode. It works with normal arrays and numpy arrays.

这里是@DoctorJ 's代码的扩展,基于@nawfal给出的方法。它可以编码和解码。它与普通数组和numpy数组一起工作。

#!/usr/bin/env python
from numbers import Integral    

def tuple_to_int(tup):
    """:Return: the unique non-negative integer encoding of a tuple of non-negative integers."""
    if len(tup) == 0:  # normally do if not tup, but doesn't work with np
        raise ValueError('Cannot encode empty tuple')
    if len(tup) == 1:
        x = tup[0]
        if not isinstance(x, Integral):
            raise ValueError('Can only encode integers')
        return x
    elif len(tup) == 2:
        # print("len=2")
        x, y = tuple_to_int(tup[0:1]), tuple_to_int(tup[1:2])  # Just to validate x and y

        X = 2 * x if x >= 0 else -2 * x - 1  # map x to positive integers
        Y = 2 * y if y >= 0 else -2 * y - 1  # map y to positive integers
        Z = (X * X + X + Y) if X >= Y else (X + Y * Y)  # encode

        # Map evens onto positives
        if (x >= 0 and y >= 0):
            return Z // 2
        elif (x < 0 and y >= 0 and X >= Y):
            return Z // 2
        elif (x < 0 and y < 0 and X < Y):
            return Z // 2
        # Map odds onto negative
        else:
            return (-Z - 1) // 2
    else:
        return tuple_to_int((tuple_to_int(tup[:2]),) + tuple(tup[2:]))  # ***speed up tuple(tup[2:])?***


def int_to_tuple(num, size=2):
    """:Return: the unique tuple of length `size` that encodes to `num`."""
    if not isinstance(num, Integral):
        raise ValueError('Can only encode integers (got {})'.format(num))
    if not isinstance(size, Integral) or size < 1:
        raise ValueError('Tuple is the wrong size ({})'.format(size))
    if size == 1:
        return (num,)
    elif size == 2:

        # Mapping onto positive integers
        Z = -2 * num - 1 if num < 0 else 2 * num

        # Reversing Pairing
        s = isqrt(Z)
        if Z - s * s < s:
            X, Y = Z - s * s, s
        else:
            X, Y = s, Z - s * s - s

        # Undoing mappint to positive integers
        x = (X + 1) // -2 if X % 2 else X // 2  # True if X not divisible by 2
        y = (Y + 1) // -2 if Y % 2 else Y // 2  # True if Y not divisible by 2

        return x, y

    else:
        x, y = int_to_tuple(num, 2)
        return int_to_tuple(x, size - 1) + (y,)


def isqrt(n):
    """":Return: the largest integer x for which x * x does not exceed n."""
    # Newton's method, via http://*.com/a/15391420
    x = n
    y = (x + 1) // 2
    while y < x:
        x = y
        y = (x + n // x) // 2
    return x

#13


1  

What you suggest is impossible. You will always have collisions.

你的建议是不可能的。你总是会有碰撞。

In order to map two objects to another single set, the mapped set must have a minimum size of the number of combinations expected:

为了将两个对象映射到另一个集合,映射集必须具有期望的组合数量的最小值:

Assuming a 32-bit integer, you have 2147483647 positive integers. Choosing two of these where order doesn't matter and with repetition yields 2305843008139952128 combinations. This does not fit nicely in the set of 32-bit integers.

假设一个32位整数,就有2147483647个正整数。选择其中两个顺序无关紧要,重复产生2305843008139952128组合。这在32位整数的集合中不太合适。

You can, however fit this mapping in 61 bits. Using a 64-bit integer is probably easiest. Set the high word to the smaller integer and the low word to the larger one.

不过,你可以用61位来匹配这个映射。使用64位整数可能是最简单的。将高的字设置为较小的整数,将低的字设置为较大的整数。

#14


-1  

let us have two number B and C , encoding them into single number A

我们有两个B和C,把它们编码成一个A。

A = B + C * N

A = B + C * N。

where

在哪里

B=A % N = B

B=A % N = B。

C=A / N = C

C=A / N = C。