生成唯一的参考编号策略

时间:2021-11-18 11:33:29

Hrm... here's where my CS knowledge lets me down. I want to write an algorithm that generates a reference number that is unique.

嗯......这是我的CS知识让我失望的地方。我想编写一个生成唯一引用号的算法。

I don't want to use sequential numbers as they introduce a security risk and I want to use alphanumerics. The ref will have a min and max length too. (I can't use a GUID it is too long)

我不想使用序列号,因为它们会带来安全风险,我想使用字母数字。裁判也会有最小和最大长度。 (我不能使用GUID它太长了)

Ideally I don't want to query my persistence layer to see if a ref has been used before.

理想情况下,我不想查询我的持久层,看看之前是否使用过ref。

What strategies can I employ?

我可以采用什么策略?

7 个解决方案

#1


2  

If you're worried about security risks, then you want a cryptographically-secure random number generator. You should be able to tell it how many bytes you want (i.e. how long the number can be).

如果您担心安全风险,那么您需要一个加密安全的随机数生成器。您应该能够告诉它您需要多少字节(即数量可以多长)。

#2


2  

If this number will be ever be referenced by humans, I encourage you to follow these guidelines in your solution:

如果这个数字将被人类引用,我建议您在解决方案中遵循以下指南:

What is the best format for a customer number, order number?

客户编号,订单号的最佳格式是什么?

If you can't synchorize with the database to see what the next number will be, and you can't use GUIDs or a comparably long random string, then you need to include some sort of local value in the ID.

如果您无法与数据库同步以查看下一个数字将是什么,并且您不能使用GUID或相对较长的随机字符串,那么您需要在ID中包含某种本地值。

e.g., if all clients will be on a known network, you can end each number in each client's ip address D block.

例如,如果所有客户端都在已知网络上,则可以在每个客户端的IP地址D块中结束每个号码。

Or, if clients have to login and each user can login only once at a time, you can include their userid in the number somewhere.

或者,如果客户端必须登录并且每个用户一次只能登录一次,则可以将其用户ID包含在某个位置。

#3


1  

I'm taking a stab in the dark here but...you want a random value that will be unique, but less then 16 bytes. Your best bet is still a GUID which is only 16 bytes....You want to use alphanumerics so...some options.

我在黑暗中采取刺,但是...你想要一个独特的随机值,但不到16个字节。你最好的选择仍然是一个只有16个字节的GUID ....你想使用字母数字所以...一些选项。

Use a GUID but encode it base64 looks like 7QDBkvCA1+B9K/U0vrQx1A which is 22 bytes which is still longer then a native Guid...but shorter then the typical string representation.

使用GUID但编码它base64看起来像7QDBkvCA1 + B9K / U0vrQx1A这是22个字节,它仍然比原生Guid长......但比典型的字符串表示短。

See Text Encoding here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier

请参阅此处的文本编码:http://en.wikipedia.org/wiki/Globally_Unique_Identifier

Another option would be to hash the Guid but you will loose some of the uniqueness so what is your tolerance level here for non-unique items?

另一种选择是对Guid进行哈希处理,但是你会失去一些独特性,那么对于非独特项目你的容忍度是多少?

==========

Assuming you have a single process inserting into the table you could emlpoyee a HiLo algorithim and be confident you don't have to hit the DB each time. You'd simply store in memory the last high value...when the process startsup you'd go hit the db to find out where you left off: What's the Hi/Lo algorithm?

假设您在表格中插入了一个进程,您可以使用HiLo算法,并确信您不必每次都能访问数据库。你只需在内存中存储最后一个高值...当进程启动时你会去数据库找出你离开的地方:什么是Hi / Lo算法?

I still say a Guid is your best bet....16 bytes is not bad and will be just as small as most alphanumeric solutions you come up with.

我仍然说Guid是你最好的选择.... 16个字节不错,就像你提出的大多数字母数字解决方案一样小。

#4


0  

One way may be to generate the numbers based on a smaller subset of numbers. For example, you could use a binary sequence to generate based on a godel numbering. For example, mapping 000 to 111 on 5z, 3y, 2x yields 0, 2, 3, 6, 5, 10, 15, 30.

一种方法可以是基于较小的数字子集生成数字。例如,您可以使用二进制序列基于godel编号生成。例如,在5z,3y,2x上将000映射到111会产生0,2,3,6,5,10,15,30。

Of course, this is overly simplistic. But by iterating of the "salt" numbers to generate the reference numbers, you wouldn't have to track the reference numbers at all. Provided, or course, you were reasonably sure you didn't have to factor in collisions.

当然,这过于简单化了。但是通过迭代“盐”数来生成参考数字,您根本不必跟踪参考数字。提供或当然,您有理由相信您不必考虑碰撞因素。

#5


0  

If possible in your application/environment, did you consider to add the time as part to a pseudo-random generated number?

如果可能在您的应用程序/环境中,您是否考虑将时间添加为伪随机生成的数字的一部分?

i.e. microtime() + rand(10000,99999)

即microtime()+ rand(10000,99999)

#6


0  

I've been doing this in a production system with success:

我一直在生产系统中这样做成功:

  • Take the current time (UTC, with microsecond precision)
  • 取当前时间(UTC,精确到微秒)

  • Your process id, thread id
  • 您的进程ID,线程ID

  • Your computer name
  • 你的电脑名称

  • A salt value (basically just a string unique to your program)
  • 盐值(基本上只是程序特有的字符串)

  • A random value (preferrably a crypto-grade PRNG)
  • 随机值(最好是加密级PRNG)

Put this in memory, either as a string, or XOR the values together or something similar. Then:

把它放在内存中,或者作为字符串,或者将值放在一起或者类似的东西。然后:

  • Hash it with e.g. SHA-1
  • 用例如哈希SHA-1

  • Do mod N on the resulting number to shrink the output to N bytes
  • 在结果数上执行mod N,将输出缩小为N个字节

  • Convert to hexadecimal or something printable if you need it.
  • 如果需要,可以转换为十六进制或可打印的内容。

Just be aware that shrinking the UID to N bytes will increase the chances of UID-collisions.

请注意,将UID缩小为N个字节会增加UID冲突的可能性。

All the input data in the first list is to ensure that you get a unique base for hashing if you have a cluster of many computers. You can omit some of them, but you have to be certain that it contains something that makes it different for each computer you'll generate the UID on.

如果您拥有许多计算机的群集,则第一个列表中的所有输入数据都是为了确保您获得哈希的唯一基础。您可以省略其中的一些,但您必须确保它包含的内容使您在生成UID的每台计算机上都有所不同。

#7


-1  

Truncate the GUID to the size you want.

将GUID截断为所需的大小。

If you're generating numbers, unless they are random and huge, you are better off checking to see if they've been used anyway.

如果你正在生成数字,除非它们是随机的和巨大的,你最好检查它们是否已被使用过。

#1


2  

If you're worried about security risks, then you want a cryptographically-secure random number generator. You should be able to tell it how many bytes you want (i.e. how long the number can be).

如果您担心安全风险,那么您需要一个加密安全的随机数生成器。您应该能够告诉它您需要多少字节(即数量可以多长)。

#2


2  

If this number will be ever be referenced by humans, I encourage you to follow these guidelines in your solution:

如果这个数字将被人类引用,我建议您在解决方案中遵循以下指南:

What is the best format for a customer number, order number?

客户编号,订单号的最佳格式是什么?

If you can't synchorize with the database to see what the next number will be, and you can't use GUIDs or a comparably long random string, then you need to include some sort of local value in the ID.

如果您无法与数据库同步以查看下一个数字将是什么,并且您不能使用GUID或相对较长的随机字符串,那么您需要在ID中包含某种本地值。

e.g., if all clients will be on a known network, you can end each number in each client's ip address D block.

例如,如果所有客户端都在已知网络上,则可以在每个客户端的IP地址D块中结束每个号码。

Or, if clients have to login and each user can login only once at a time, you can include their userid in the number somewhere.

或者,如果客户端必须登录并且每个用户一次只能登录一次,则可以将其用户ID包含在某个位置。

#3


1  

I'm taking a stab in the dark here but...you want a random value that will be unique, but less then 16 bytes. Your best bet is still a GUID which is only 16 bytes....You want to use alphanumerics so...some options.

我在黑暗中采取刺,但是...你想要一个独特的随机值,但不到16个字节。你最好的选择仍然是一个只有16个字节的GUID ....你想使用字母数字所以...一些选项。

Use a GUID but encode it base64 looks like 7QDBkvCA1+B9K/U0vrQx1A which is 22 bytes which is still longer then a native Guid...but shorter then the typical string representation.

使用GUID但编码它base64看起来像7QDBkvCA1 + B9K / U0vrQx1A这是22个字节,它仍然比原生Guid长......但比典型的字符串表示短。

See Text Encoding here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier

请参阅此处的文本编码:http://en.wikipedia.org/wiki/Globally_Unique_Identifier

Another option would be to hash the Guid but you will loose some of the uniqueness so what is your tolerance level here for non-unique items?

另一种选择是对Guid进行哈希处理,但是你会失去一些独特性,那么对于非独特项目你的容忍度是多少?

==========

Assuming you have a single process inserting into the table you could emlpoyee a HiLo algorithim and be confident you don't have to hit the DB each time. You'd simply store in memory the last high value...when the process startsup you'd go hit the db to find out where you left off: What's the Hi/Lo algorithm?

假设您在表格中插入了一个进程,您可以使用HiLo算法,并确信您不必每次都能访问数据库。你只需在内存中存储最后一个高值...当进程启动时你会去数据库找出你离开的地方:什么是Hi / Lo算法?

I still say a Guid is your best bet....16 bytes is not bad and will be just as small as most alphanumeric solutions you come up with.

我仍然说Guid是你最好的选择.... 16个字节不错,就像你提出的大多数字母数字解决方案一样小。

#4


0  

One way may be to generate the numbers based on a smaller subset of numbers. For example, you could use a binary sequence to generate based on a godel numbering. For example, mapping 000 to 111 on 5z, 3y, 2x yields 0, 2, 3, 6, 5, 10, 15, 30.

一种方法可以是基于较小的数字子集生成数字。例如,您可以使用二进制序列基于godel编号生成。例如,在5z,3y,2x上将000映射到111会产生0,2,3,6,5,10,15,30。

Of course, this is overly simplistic. But by iterating of the "salt" numbers to generate the reference numbers, you wouldn't have to track the reference numbers at all. Provided, or course, you were reasonably sure you didn't have to factor in collisions.

当然,这过于简单化了。但是通过迭代“盐”数来生成参考数字,您根本不必跟踪参考数字。提供或当然,您有理由相信您不必考虑碰撞因素。

#5


0  

If possible in your application/environment, did you consider to add the time as part to a pseudo-random generated number?

如果可能在您的应用程序/环境中,您是否考虑将时间添加为伪随机生成的数字的一部分?

i.e. microtime() + rand(10000,99999)

即microtime()+ rand(10000,99999)

#6


0  

I've been doing this in a production system with success:

我一直在生产系统中这样做成功:

  • Take the current time (UTC, with microsecond precision)
  • 取当前时间(UTC,精确到微秒)

  • Your process id, thread id
  • 您的进程ID,线程ID

  • Your computer name
  • 你的电脑名称

  • A salt value (basically just a string unique to your program)
  • 盐值(基本上只是程序特有的字符串)

  • A random value (preferrably a crypto-grade PRNG)
  • 随机值(最好是加密级PRNG)

Put this in memory, either as a string, or XOR the values together or something similar. Then:

把它放在内存中,或者作为字符串,或者将值放在一起或者类似的东西。然后:

  • Hash it with e.g. SHA-1
  • 用例如哈希SHA-1

  • Do mod N on the resulting number to shrink the output to N bytes
  • 在结果数上执行mod N,将输出缩小为N个字节

  • Convert to hexadecimal or something printable if you need it.
  • 如果需要,可以转换为十六进制或可打印的内容。

Just be aware that shrinking the UID to N bytes will increase the chances of UID-collisions.

请注意,将UID缩小为N个字节会增加UID冲突的可能性。

All the input data in the first list is to ensure that you get a unique base for hashing if you have a cluster of many computers. You can omit some of them, but you have to be certain that it contains something that makes it different for each computer you'll generate the UID on.

如果您拥有许多计算机的群集,则第一个列表中的所有输入数据都是为了确保您获得哈希的唯一基础。您可以省略其中的一些,但您必须确保它包含的内容使您在生成UID的每台计算机上都有所不同。

#7


-1  

Truncate the GUID to the size you want.

将GUID截断为所需的大小。

If you're generating numbers, unless they are random and huge, you are better off checking to see if they've been used anyway.

如果你正在生成数字,除非它们是随机的和巨大的,你最好检查它们是否已被使用过。