如何使OSX的rand()无法进行光谱测试?

时间:2022-03-18 20:45:59

For the purposes of a programming class I'm trying to illustrate the weaknesses of the random number generators that usually come with the standard C library, specifically the "bad random generator" rand() that comes with OSX (quoth the manpage).

出于编程类的目的,我试图说明通常随标准C库一起出现的随机数生成器的弱点,特别是OSX附带的“坏随机生成器”rand()(请参阅联机帮助页)。

I wrote a simple program to test my understanding of the spectral test:

我写了一个简单的程序来测试我对光谱测试的理解:

#include <stdio.h>
#include <stdlib.h>

int main() {
  int i;
  int prev = rand();
  int new;

  for (i=0; i<100000; i++) {
    new = rand();
    printf("%d %d\n", prev, new);
    prev = new;
  }
  return 0;
}

But when I plot the resulting scatterplot, here is what I get:

但是当我绘制得到的散点图时,我得到的是:

如何使OSX的rand()无法进行光谱测试?

I would have expected something showing more structure, like what one finds on Wikipedia. Am I doing something wrong here? Should I plot in more dimensions?

我会期望某些东西显示出更多的结构,就像在*上发现的那样。我在这里做错了吗?我应该绘制更多尺寸吗?

UPDATE

Following pjs's suggestion I zoomed in on the part of the plot where the numbers are smaller than 1e7, and here is what I found:

按照pjs的建议,我放大了数字小于1e7的情节部分,这是我发现的:

如何使OSX的rand()无法进行光谱测试?

I find exactly the same lines showed by pjs. They seem to be vertical, but this is impossible as it would imply that some values were "missed" by rand(). When I sort -n the data this is (a sample of) what I see:

我发现pjs显示完全相同的线条。它们似乎是垂直的,但这是不可能的,因为它意味着某些值被rand()“遗漏”了。当我对数据进行排序时,这是我看到的(样本):

571 9596797
572 9613604
575 9664025
578 9714446
580 9748060
581 9764867
584 9815288
586 9848902
587 9865709
590 9916130
592 9949744
127774 13971
127775 30778
127780 114813
127781 131620
127782 148427
127783 165234
127785 198848
127787 232462
127788 249269

In other words, the points lie in lines that are almost, but not quite, vertical.

换句话说,这些点位于几乎但不完全垂直的线条中。

2 个解决方案

#1


11  

Linear congruential generators all suffer from a problem identified by George Marsaglia. "Marsaglia's Theorem" says that k-tuples (vectors of length k) will fall on a bounded number of hyperplanes. The bound is m**(1/k), where k is the size of the tuple and m is the number used for the modulus of the generator. Thus, if the modulus is (2**31 - 1) and you're looking at sets of 3, a 3-d plot will show the points falling on no more than the cube root of (2**31 - 1), or about 1290 planes, when viewed from the right orientation.

线性同余发生器都受到George Marsaglia发现的问题的困扰。 “Marsaglia的定理”说k元组(长度为k的向量)将落在有限数量的超平面上。边界是m **(1 / k),其中k是元组的大小,m是用于生成器模数的数字。因此,如果模数是(2 ** 31 - 1)并且您正在查看3组,则3-d图将显示不超过(2 ** 31 - 1)的立方根的点数,从正确的方向看,或大约1290架飞机。

All LCG's are subject to Marsaglia's Theorem. A "good" one performs at or close to the upper bound, a bad one falls well short of the upper bound. That's what the spectral test is effectively measuring, and that's what you were seeing in your Wikipedia link - RANDU, the LCG from hell, produces triplets that fall in a mere 15 planes.

所有LCG都受Marsaglia定理的约束。一个“好的”在上限或接近上限执行,一个坏的下降远远超过上限。这就是光谱测试有效测量的内容,这就是你在*链接中看到的内容 - 来自地狱的LCG RANDU产生的三元组只有15架飞机。

Apple's carbon library generator uses 16807 as its multiplier and (2**31 - 1) as its modulus. As LCG's go, it's not really all that bad. Hence your plot didn't show the same extremes RANDU has. However, if you want decent quality random numbers don't use an LCG.

Apple的碳库生成器使用16807作为其乘数,并使用(2 ** 31 - 1)作为其模数。正如LCG一样,它并不是那么糟糕。因此,你的情节没有表现出兰多的极端情况。但是,如果您想要质量合理的随机数,请不要使用LCG。

Addendum

I've gone ahead and cranked a billion numbers from the Apple rand() function, but only printed the ones where both values of the pair were less than 2 million, i.e., the bottom left corner of your plot. Sure enough, they fall on lines. You just need to really zoom in to see it because of the density of lines.

我已经开始从Apple rand()函数中获得了数十亿个数字,但只打印了两个值都小于200万的那些数字,即你的情节的左下角。果然,他们落在了线上。由于线条的密度,你只需要真正放大才能看到它。

Old George was a clever fella!

老乔治是一个聪明的家伙!

如何使OSX的rand()无法进行光谱测试?

#2


3  

Assuming the bad rand is a linear congruential generator, i.e. it's of the form:

假设坏兰特是一个线性同余生成器,即它的形式如下:

next = a * prev + b (mod RAND_MAX+1)

you can just take a few terms and solve the equations for a and b. After that, you should be able to generate a function of the output such that the structure becomes readily apparent.

你可以用几个术语来解决a和b的方程。之后,您应该能够生成输出函数,以便结构变得明显。

#1


11  

Linear congruential generators all suffer from a problem identified by George Marsaglia. "Marsaglia's Theorem" says that k-tuples (vectors of length k) will fall on a bounded number of hyperplanes. The bound is m**(1/k), where k is the size of the tuple and m is the number used for the modulus of the generator. Thus, if the modulus is (2**31 - 1) and you're looking at sets of 3, a 3-d plot will show the points falling on no more than the cube root of (2**31 - 1), or about 1290 planes, when viewed from the right orientation.

线性同余发生器都受到George Marsaglia发现的问题的困扰。 “Marsaglia的定理”说k元组(长度为k的向量)将落在有限数量的超平面上。边界是m **(1 / k),其中k是元组的大小,m是用于生成器模数的数字。因此,如果模数是(2 ** 31 - 1)并且您正在查看3组,则3-d图将显示不超过(2 ** 31 - 1)的立方根的点数,从正确的方向看,或大约1290架飞机。

All LCG's are subject to Marsaglia's Theorem. A "good" one performs at or close to the upper bound, a bad one falls well short of the upper bound. That's what the spectral test is effectively measuring, and that's what you were seeing in your Wikipedia link - RANDU, the LCG from hell, produces triplets that fall in a mere 15 planes.

所有LCG都受Marsaglia定理的约束。一个“好的”在上限或接近上限执行,一个坏的下降远远超过上限。这就是光谱测试有效测量的内容,这就是你在*链接中看到的内容 - 来自地狱的LCG RANDU产生的三元组只有15架飞机。

Apple's carbon library generator uses 16807 as its multiplier and (2**31 - 1) as its modulus. As LCG's go, it's not really all that bad. Hence your plot didn't show the same extremes RANDU has. However, if you want decent quality random numbers don't use an LCG.

Apple的碳库生成器使用16807作为其乘数,并使用(2 ** 31 - 1)作为其模数。正如LCG一样,它并不是那么糟糕。因此,你的情节没有表现出兰多的极端情况。但是,如果您想要质量合理的随机数,请不要使用LCG。

Addendum

I've gone ahead and cranked a billion numbers from the Apple rand() function, but only printed the ones where both values of the pair were less than 2 million, i.e., the bottom left corner of your plot. Sure enough, they fall on lines. You just need to really zoom in to see it because of the density of lines.

我已经开始从Apple rand()函数中获得了数十亿个数字,但只打印了两个值都小于200万的那些数字,即你的情节的左下角。果然,他们落在了线上。由于线条的密度,你只需要真正放大才能看到它。

Old George was a clever fella!

老乔治是一个聪明的家伙!

如何使OSX的rand()无法进行光谱测试?

#2


3  

Assuming the bad rand is a linear congruential generator, i.e. it's of the form:

假设坏兰特是一个线性同余生成器,即它的形式如下:

next = a * prev + b (mod RAND_MAX+1)

you can just take a few terms and solve the equations for a and b. After that, you should be able to generate a function of the output such that the structure becomes readily apparent.

你可以用几个术语来解决a和b的方程。之后,您应该能够生成输出函数,以便结构变得明显。