Ruby兰特的有效种子范围是多少?

时间:2022-10-08 18:23:14

Ruby implements PRNGs as "a modified Mersenne Twister with a period of 2**19937-1." 1

Ruby将PRNG实现为“经过修改的Mersenne Twister,周期为2 ** 19937-1。” 1

The way I understand MT is that it operates on 2^32 different seeds. What confuses me is that Random.new(seed) accepts arbitrarily big numbers such as Random.new(2**100).

我理解MT的方式是它在2 ^ 32种不同的种子上运行。令我困惑的是Random.new(种子)接受任意大数字,例如Random.new(2 ** 100)。

However, I wasn't able to find (logical) collisions:

但是,我无法找到(逻辑)碰撞:

Random.new(1).rand(10**5) == Random.new(2**32-1).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32).rand(10**5) => false
Random.new(1).rand(10**5) == Random.new(2**32+1).rand(10**5) => false

Given that we'd like to utilize MT's maximum seed range in the sense that we want to use as many different seeds as possible while still avoiding collisions with two different seeds, what seed range achieves this?

鉴于我们希望利用MT的最大种子范围,我们希望尽可能多地使用不同的种子,同时仍避免与两种不同的种子发生碰撞,那么种子范围是如何实现的呢?

I tried understanding what is happening inside the Ruby's random implementation, but didn't get too far. https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370

我试着理解Ruby随机实现中发生的事情,但是没有走得太远。 https://github.com/ruby/ruby/blob/c5e08b764eb342538884b383f0e6428b6faf214b/random.c#L370

1 个解决方案

#1


10  

The Mersenne Twister sequence is 2 ** ( 624 * 32 - 1 ) - 1 long, and the seed value is used to set an internal state for the PRNG that directly relates to the position within that sequence.

Mersenne Twister序列为2 **(624 * 32 - 1) - 1长,种子值用于设置PRNG的内部状态,该状态与该序列中的位置直接相关。

The easiest-to-find repeat appears to be every 2 ** ( 624 * 32 ), and can be shown to work like this:

最容易找到的重复似乎是每2 **(624 * 32),并且可以显示为这样工作:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 r17 = Random.new( start_value + 17 * repeat_every )

 r23 = Random.new( start_value + 23 * repeat_every )

 r1.rand == r2.rand  
 # true

 r17.rand == r23.rand  
 # true

Or try this:

或试试这个:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 Array.new(10) { r1.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

 Array.new(10) { r2.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

The repeat value relates to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked packs a Ruby Fixnum into an array - the magic command is

重复值与Mersenne Twister的工作原理有关。 MT的内部状态是一个由624个32位无符号整数组成的数组。您链接的Ruby源代码将Ruby Fixnum打包成一个数组 - 神奇的命令是

  rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0,
        INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER );

however, this isn't something easy to play with, it is defined in internal.h, so only really accessible if you work on Ruby interpreter itself. You cannot access this function from within a normal C extension.

但是,这不是一件容易理解的事情,它在internal.h中定义,所以只有在你自己使用Ruby解释器时才能真正访问它。您无法在普通的C扩展中访问此功能。

The packed integer is then loaded to the MT's internal state by the function init_by_array. This is quite a complex-looking function - the packed seed value is not written literally into the state, but instead the state is generated item by item, adding in the supplied array values, using a variety of xors, additions and cross-referencing the previous value (the Ruby source here also adds in the packed array's index position, commented "non-linear", I think that is one of the referenced modifications to standard MT)

然后通过函数init_by_array将打包的整数加载到MT的内部状态。这是一个非常复杂的函数 - 打包的种子值不是字面上写入状态,而是逐项生成状态,添加提供的数组值,使用各种xors,添加和交叉引用之前的值(此处的Ruby源也添加了打包数组的索引位置,注释为“非线性”,我认为这是对标准MT的引用修改之一)

Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 ) - the repeat_every value I show here is skipping over 2 sequences at a time, but it is the easiest to find repeating seed value, because it is easy to see how it sets the internal state exactly the same (because the first 624 items in the array representation of the seed are identical, and that is all that gets used later). Other seed values will also produce the same internal state, but the relationship is a complex mapping that pairs up each integer in the 19938-bit space with another integer which creates the same state for MT.

请注意,MT序列的大小小于2 **(624 * 32) - 我在此处显示的repeat_every值一次跳过2个序列,但它最容易找到重复的种子值,因为它很容易看看它如何设置内部状态完全相同(因为种子的数组表示中的前624个项是相同的,这就是以后使用的所有项)。其他种子值也将产生相同的内部状态,但该关系是一个复杂的映射,它将19938位空间中的每个整数与另一个整数配对,从而为MT创建相同的状态。

#1


10  

The Mersenne Twister sequence is 2 ** ( 624 * 32 - 1 ) - 1 long, and the seed value is used to set an internal state for the PRNG that directly relates to the position within that sequence.

Mersenne Twister序列为2 **(624 * 32 - 1) - 1长,种子值用于设置PRNG的内部状态,该状态与该序列中的位置直接相关。

The easiest-to-find repeat appears to be every 2 ** ( 624 * 32 ), and can be shown to work like this:

最容易找到的重复似乎是每2 **(624 * 32),并且可以显示为这样工作:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 r17 = Random.new( start_value + 17 * repeat_every )

 r23 = Random.new( start_value + 23 * repeat_every )

 r1.rand == r2.rand  
 # true

 r17.rand == r23.rand  
 # true

Or try this:

或试试这个:

 repeat_every =  2 ** ( 624 * 32 )

 start_value = 5024214421  # Try any value

 r1 = Random.new( start_value )

 r2 = Random.new( start_value + repeat_every )

 Array.new(10) { r1.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

 Array.new(10) { r2.rand(100) }
 # => [84, 86, 8, 58, 5, 21, 79, 10, 17, 50]

The repeat value relates to how Mersenne Twister works. The internal state of MT is an array of 624 32-bit unsigned integers. The Ruby source code you linked packs a Ruby Fixnum into an array - the magic command is

重复值与Mersenne Twister的工作原理有关。 MT的内部状态是一个由624个32位无符号整数组成的数组。您链接的Ruby源代码将Ruby Fixnum打包成一个数组 - 神奇的命令是

  rb_integer_pack( seed, buf, len, sizeof(uint32_t), 0,
        INTEGER_PACK_LSWORD_FIRST|INTEGER_PACK_NATIVE_BYTE_ORDER );

however, this isn't something easy to play with, it is defined in internal.h, so only really accessible if you work on Ruby interpreter itself. You cannot access this function from within a normal C extension.

但是,这不是一件容易理解的事情,它在internal.h中定义,所以只有在你自己使用Ruby解释器时才能真正访问它。您无法在普通的C扩展中访问此功能。

The packed integer is then loaded to the MT's internal state by the function init_by_array. This is quite a complex-looking function - the packed seed value is not written literally into the state, but instead the state is generated item by item, adding in the supplied array values, using a variety of xors, additions and cross-referencing the previous value (the Ruby source here also adds in the packed array's index position, commented "non-linear", I think that is one of the referenced modifications to standard MT)

然后通过函数init_by_array将打包的整数加载到MT的内部状态。这是一个非常复杂的函数 - 打包的种子值不是字面上写入状态,而是逐项生成状态,添加提供的数组值,使用各种xors,添加和交叉引用之前的值(此处的Ruby源也添加了打包数组的索引位置,注释为“非线性”,我认为这是对标准MT的引用修改之一)

Note that the size of the MT sequence is smaller than 2 ** ( 624 * 32 ) - the repeat_every value I show here is skipping over 2 sequences at a time, but it is the easiest to find repeating seed value, because it is easy to see how it sets the internal state exactly the same (because the first 624 items in the array representation of the seed are identical, and that is all that gets used later). Other seed values will also produce the same internal state, but the relationship is a complex mapping that pairs up each integer in the 19938-bit space with another integer which creates the same state for MT.

请注意,MT序列的大小小于2 **(624 * 32) - 我在此处显示的repeat_every值一次跳过2个序列,但它最容易找到重复的种子值,因为它很容易看看它如何设置内部状态完全相同(因为种子的数组表示中的前624个项是相同的,这就是以后使用的所有项)。其他种子值也将产生相同的内部状态,但该关系是一个复杂的映射,它将19938位空间中的每个整数与另一个整数配对,从而为MT创建相同的状态。