前言
事情起源于一位网友分享了一个有趣的面试题:
生成由六位数字组成的id,要求随机数字,不排重,不可自增,且数字不重复。id总数为几十万。
初次解答
我一开始想到的办法是
- 生成一个足够大的id池(其实就是需要多少就生成多少)
- 对id池中的数字进行随机排序
- 依次消费id池中的数字
可惜这个方法十分浪费空间,且性能很差。
初遇梅森旋转算法
后面咨询了网友后得知了一个高效的随机数算法:梅森旋转(mersenne twister/mt)。通过搜索资料得知:
梅森旋转算法(mersenne twister)是一个伪随机数发生算法。由松本真和西村拓士在1997年开发,基于有限二进制字段上的矩阵线性递归。可以快速产生高质量的伪随机数,修正了古典随机数发生算法的很多缺陷。
最为广泛使用mersenne twister的一种变体是mt19937,可以产生32位整数序列。
ps:此算法依然无法完美解决面试题,但是也算学到了新知识
mt19937算法实现
后面通过google,找到了一个高效的mt19937的java版本代码。原代码链接为http://www.math.sci.hiroshima-u.ac.jp/~m-mat/mt/versions/java/mtrandom.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
|
import java.util.random;
/**
* mt19937的java实现
*/
public class mtrandom extends random {
// constants used in the original c implementation
private final static int upper_mask = 0x80000000 ;
private final static int lower_mask = 0x7fffffff ;
private final static int n = 624 ;
private final static int m = 397 ;
private final static int magic[] = { 0x0 , 0x9908b0df };
private final static int magic_factor1 = 1812433253 ;
private final static int magic_factor2 = 1664525 ;
private final static int magic_factor3 = 1566083941 ;
private final static int magic_mask1 = 0x9d2c5680 ;
private final static int magic_mask2 = 0xefc60000 ;
private final static int magic_seed = 19650218 ;
private final static long default_seed = 5489l;
// internal state
private transient int [] mt;
private transient int mti;
private transient boolean compat = false ;
// temporary buffer used during setseed(long)
private transient int [] ibuf;
/**
* the default constructor for an instance of mtrandom. this invokes
* the no-argument constructor for java.util.random which will result
* in the class being initialised with a seed value obtained by calling
* system.currenttimemillis().
*/
public mtrandom() { }
/**
* this version of the constructor can be used to implement identical
* behaviour to the original c code version of this algorithm including
* exactly replicating the case where the seed value had not been set
* prior to calling genrand_int32.
* <p>
* if the compatibility flag is set to true, then the algorithm will be
* seeded with the same default value as was used in the original c
* code. furthermore the setseed() method, which must take a 64 bit
* long value, will be limited to using only the lower 32 bits of the
* seed to facilitate seamless migration of existing c code into java
* where identical behaviour is required.
* <p>
* whilst useful for ensuring backwards compatibility, it is advised
* that this feature not be used unless specifically required, due to
* the reduction in strength of the seed value.
*
* @param compatible compatibility flag for replicating original
* behaviour.
*/
public mtrandom( boolean compatible) {
super (0l);
compat = compatible;
setseed(compat?default_seed:system.currenttimemillis());
}
/**
* this version of the constructor simply initialises the class with
* the given 64 bit seed value. for a better random number sequence
* this seed value should contain as much entropy as possible.
*
* @param seed the seed value with which to initialise this class.
*/
public mtrandom( long seed) {
super (seed);
}
/**
* this version of the constructor initialises the class with the
* given byte array. all the data will be used to initialise this
* instance.
*
* @param buf the non-empty byte array of seed information.
* @throws nullpointerexception if the buffer is null.
* @throws illegalargumentexception if the buffer has zero length.
*/
public mtrandom( byte [] buf) {
super (0l);
setseed(buf);
}
/**
* this version of the constructor initialises the class with the
* given integer array. all the data will be used to initialise
* this instance.
*
* @param buf the non-empty integer array of seed information.
* @throws nullpointerexception if the buffer is null.
* @throws illegalargumentexception if the buffer has zero length.
*/
public mtrandom( int [] buf) {
super (0l);
setseed(buf);
}
// initializes mt[n] with a simple integer seed. this method is
// required as part of the mersenne twister algorithm but need
// not be made public.
private final void setseed( int seed) {
// annoying runtime check for initialisation of internal data
// caused by java.util.random invoking setseed() during init.
// this is unavoidable because no fields in our instance will
// have been initialised at this point, not even if the code
// were placed at the declaration of the member variable.
if (mt == null ) mt = new int [n];
// ---- begin mersenne twister algorithm ----
mt[ 0 ] = seed;
for (mti = 1 ; mti < n; mti++) {
mt[mti] = (magic_factor1 * (mt[mti- 1 ] ^ (mt[mti- 1 ] >>> 30 )) + mti);
}
// ---- end mersenne twister algorithm ----
}
/**
* this method resets the state of this instance using the 64
* bits of seed data provided. note that if the same seed data
* is passed to two different instances of mtrandom (both of
* which share the same compatibility state) then the sequence
* of numbers generated by both instances will be identical.
* <p>
* if this instance was initialised in 'compatibility' mode then
* this method will only use the lower 32 bits of any seed value
* passed in and will match the behaviour of the original c code
* exactly with respect to state initialisation.
*
* @param seed the 64 bit value used to initialise the random
* number generator state.
*/
public final synchronized void setseed( long seed) {
if (compat) {
setseed(( int )seed);
} else {
// annoying runtime check for initialisation of internal data
// caused by java.util.random invoking setseed() during init.
// this is unavoidable because no fields in our instance will
// have been initialised at this point, not even if the code
// were placed at the declaration of the member variable.
if (ibuf == null ) ibuf = new int [ 2 ];
ibuf[ 0 ] = ( int )seed;
ibuf[ 1 ] = ( int )(seed >>> 32 );
setseed(ibuf);
}
}
/**
* this method resets the state of this instance using the byte
* array of seed data provided. note that calling this method
* is equivalent to calling "setseed(pack(buf))" and in particular
* will result in a new integer array being generated during the
* call. if you wish to retain this seed data to allow the pseudo
* random sequence to be restarted then it would be more efficient
* to use the "pack()" method to convert it into an integer array
* first and then use that to re-seed the instance. the behaviour
* of the class will be the same in both cases but it will be more
* efficient.
*
* @param buf the non-empty byte array of seed information.
* @throws nullpointerexception if the buffer is null.
* @throws illegalargumentexception if the buffer has zero length.
*/
public final void setseed( byte [] buf) {
setseed(pack(buf));
}
/**
* this method resets the state of this instance using the integer
* array of seed data provided. this is the canonical way of
* resetting the pseudo random number sequence.
*
* @param buf the non-empty integer array of seed information.
* @throws nullpointerexception if the buffer is null.
* @throws illegalargumentexception if the buffer has zero length.
*/
public final synchronized void setseed( int [] buf) {
int length = buf.length;
if (length == 0 ) throw new illegalargumentexception( "seed buffer may not be empty" );
// ---- begin mersenne twister algorithm ----
int i = 1 , j = 0 , k = (n > length ? n : length);
setseed(magic_seed);
for (; k > 0 ; k--) {
mt[i] = (mt[i] ^ ((mt[i- 1 ] ^ (mt[i- 1 ] >>> 30 )) * magic_factor2)) + buf[j] + j;
i++; j++;
if (i >= n) { mt[ 0 ] = mt[n- 1 ]; i = 1 ; }
if (j >= length) j = 0 ;
}
for (k = n- 1 ; k > 0 ; k--) {
mt[i] = (mt[i] ^ ((mt[i- 1 ] ^ (mt[i- 1 ] >>> 30 )) * magic_factor3)) - i;
i++;
if (i >= n) { mt[ 0 ] = mt[n- 1 ]; i = 1 ; }
}
mt[ 0 ] = upper_mask; // msb is 1; assuring non-zero initial array
// ---- end mersenne twister algorithm ----
}
/**
* this method forms the basis for generating a pseudo random number
* sequence from this class. if given a value of 32, this method
* behaves identically to the genrand_int32 function in the original
* c code and ensures that using the standard nextint() function
* (inherited from random) we are able to replicate behaviour exactly.
* <p>
* note that where the number of bits requested is not equal to 32
* then bits will simply be masked out from the top of the returned
* integer value. that is to say that:
* <pre>
* mt.setseed(12345);
* int foo = mt.nextint(16) + (mt.nextint(16) << 16);</pre>
* will not give the same result as
* <pre>
* mt.setseed(12345);
* int foo = mt.nextint(32);</pre>
*
* @param bits the number of significant bits desired in the output.
* @return the next value in the pseudo random sequence with the
* specified number of bits in the lower part of the integer.
*/
protected final synchronized int next( int bits) {
// ---- begin mersenne twister algorithm ----
int y, kk;
if (mti >= n) { // generate n words at one time
// in the original c implementation, mti is checked here
// to determine if initialisation has occurred; if not
// it initialises this instance with default_seed (5489).
// this is no longer necessary as initialisation of the
// java instance must result in initialisation occurring
// use the constructor mtrandom(true) to enable backwards
// compatible behaviour.
for (kk = 0 ; kk < n-m; kk++) {
y = (mt[kk] & upper_mask) | (mt[kk+ 1 ] & lower_mask);
mt[kk] = mt[kk+m] ^ (y >>> 1 ) ^ magic[y & 0x1 ];
}
for (;kk < n- 1 ; kk++) {
y = (mt[kk] & upper_mask) | (mt[kk+ 1 ] & lower_mask);
mt[kk] = mt[kk+(m-n)] ^ (y >>> 1 ) ^ magic[y & 0x1 ];
}
y = (mt[n- 1 ] & upper_mask) | (mt[ 0 ] & lower_mask);
mt[n- 1 ] = mt[m- 1 ] ^ (y >>> 1 ) ^ magic[y & 0x1 ];
mti = 0 ;
}
y = mt[mti++];
// tempering
y ^= (y >>> 11 );
y ^= (y << 7 ) & magic_mask1;
y ^= (y << 15 ) & magic_mask2;
y ^= (y >>> 18 );
// ---- end mersenne twister algorithm ----
return (y >>> ( 32 -bits));
}
// this is a fairly obscure little code section to pack a
// byte[] into an int[] in little endian ordering.
/**
* this simply utility method can be used in cases where a byte
* array of seed data is to be used to repeatedly re-seed the
* random number sequence. by packing the byte array into an
* integer array first, using this method, and then invoking
* setseed() with that; it removes the need to re-pack the byte
* array each time setseed() is called.
* <p>
* if the length of the byte array is not a multiple of 4 then
* it is implicitly padded with zeros as necessary. for example:
* <pre> byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 }</pre>
* becomes
* <pre> int[] { 0x04030201, 0x00000605 }</pre>
* <p>
* note that this method will not complain if the given byte array
* is empty and will produce an empty integer array, but the
* setseed() method will throw an exception if the empty integer
* array is passed to it.
*
* @param buf the non-null byte array to be packed.
* @return a non-null integer array of the packed bytes.
* @throws nullpointerexception if the given byte array is null.
*/
public static int [] pack( byte [] buf) {
int k, blen = buf.length, ilen = ((buf.length+ 3 ) >>> 2 );
int [] ibuf = new int [ilen];
for ( int n = 0 ; n < ilen; n++) {
int m = (n+ 1 ) << 2 ;
if (m > blen) m = blen;
for (k = buf[--m]& 0xff ; (m & 0x3 ) != 0 ; k = (k << 8 ) | buf[--m]& 0xff );
ibuf[n] = k;
}
return ibuf;
}
}
|
测试
测试代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
// mt19937的java实现
mtrandom mtrandom= new mtrandom();
map<integer,integer> map= new hashmap<>();
//循环次数
int times= 1000000 ;
long starttime=system.currenttimemillis();
for ( int i= 0 ;i<times;i++){
//使用map去重
map.put(mtrandom.next( 32 ), 0 );
}
//打印循环次数
system.out.println( "times:" +times);
//打印map的个数
system.out.println( "num:" +map.size());
//打印非重复比率
system.out.println( "proportion:" +map.size()/( double )times);
//花费的时间(单位为毫秒)
system.out.println( "time:" +(system.currenttimemillis()-starttime));
|
测试结果
times:1000000
num:999886
proportion:0.999886
time:374
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://segmentfault.com/a/1190000018196230