我如何使用rand_r以及如何以线程安全的方式使用它?

时间:2022-10-20 07:06:31

I am trying to learn how to use rand_r, and after reading this question I am still a little confused, can someone please take a look and point out what I'm missing? To my understanding, rand_r takes a pointer to some value (or a piece of memory with some initial value) and use it to generate new numbers every time it is called. Each thread that calls rand_r should supply it with a unique pointer (or piece of memory) to get "actual random" numbers between different threads. That's why this:

我正在努力学习如何使用rand_r,在阅读完这个问题之后我仍然有点困惑,有人可以看看并指出我缺少的东西吗?据我所知,rand_r接受一个指向某个值的指针(或一段具有一些初始值的内存),并在每次调用时使用它来生成新的数字。每个调用rand_r的线程都应该为它提供一个唯一的指针(或一块内存),以获得不同线程之间的“实际随机”数字。这就是为什么这个:

int globalSeed;

//thread 1
rand_r(&globalSeed);

//thread 2
rand_r(&globalSeed);

is the wrong way of using it. If I have

是错误的使用方式。如果我有

int seed1,seed2;

//thread 1
rand_r(&seed1);

//thread 2
rand_r(&seed2);

this would be the right way to generate "true random" numbers between threads?

这是在线程之间生成“真随机”数字的正确方法吗?


EDIT: additional questions after reading answers to the above part:

编辑:阅读上述部分的答案后的其他问题:

  1. if in thread 1 I need a random number between 1 to n, should I do (rand_r(&seed1) % (n-1)) + 1 ? Or there is other common way of doing this?
  2. 如果在线程1中我需要1到n之间的随机数,我应该这样做(rand_r(&seed1)%(n-1))+ 1?或者还有其他常见的方法吗?

  3. Is it right or normal if the memory for the seed is dynamically allocated?
  4. 如果动态分配种子的内存是正确还是正常?

1 个解决方案

#1


15  

That's correct. What you're doing in the first case is bypassing the thread-safety nature of rand_r. With many non-thread-safe functions, persistent state is stored between calls to that function (such as the random seed here).

那是对的。你在第一种情况下所做的是绕过rand_r的线程安全性。对于许多非线程安全函数,持久状态存储在对该函数的调用之间(例如此处的随机种子)。

With the thread-safe variant, you actually provide a thread-specific piece of data (seed1 and seed2) to ensure the state is not shared between threads.

使用线程安全的变体,您实际上提供了一个特定于线程的数据(seed1和seed2),以确保线程之间不共享状态。

Keep in mind that this doesn't make the numbers truly random, it just makes the sequences independent of each other. If you start them with the same seed, you'll probably get the same sequence in both threads.

请记住,这并不会使数字真正随机,它只会使序列彼此独立。如果你用相同的种子启动它们,你可能会在两个线程中得到相同的序列。

By way of example, let's say you get a random sequence 2, 3, 5, 7, 11, 13, 17 given an initial seed of 0. With a shared seed, alternating calls to rand_r from two different threads would cause this:

举例来说,假设你得到一个随机序列2,3,5,7,11,13,17,给定初始种子为0.对于共享种子,从两个不同的线程交替调用rand_r会导致:

thread 1                thread 2
           <---  2
                 3 --->
           <---  5
                 7 --->
           <--- 11
                13 --->
           <--- 17

and that's the best case - you may actually find that the shared state gets corrupted since the updates on it may not be atomic.

这是最好的情况 - 你可能会发现共享状态被破坏,因为它的更新可能不是原子的。

With non-shared state (with a and b representing the two different sources of the random numbers):

使用非共享状态(a和b代表随机数的两个不同来源):

thread 1                thread 2
           <---  2a
                 2b --->
           <---  3a
                 3b --->
           <---  5a
                 5b --->
                 ::

Some thread-safe calls require you to provide the thread-specific state like this, others can create thread-specific data under the covers (using a thread ID or similar information) so that you never need to worry about it, and you can use exactly the same source code in threaded and non-threaded environments. I prefer the latter myself, simply because it makes my life easier.

一些线程安全调用要求您提供这样的特定于线程的状态,其他人可以在封面下创建特定于线程的数据(使用线程ID或类似信息),这样您就不必担心它,并且您可以使用在线程和非线程环境中完全相同的源代码。我自己更喜欢后者,仅仅是因为它让我的生活更轻松。


Additional stuff for edited question:

编辑问题的其他内容:

> If in thread 1, I need a random number between 1 to n, should I do '(rand_r(&seed1) % (n-1)) + 1', or there is other common way of doing this?

>如果在第1个主题中,我需要1到n之间的随机数,我应该'(rand_r(&seed1)%(n-1))+ 1',还是有其他常见方法可以做到这一点?

Assuming you want a value between 1 and n inclusive, use (rand_r(&seed1) % n) + 1. The first bit gives you a value from 0 to n-1 inclusive, then you add 1 to get the desired range.

假设您想要一个介于1和n之间的值,请使用(rand_r(&seed1)%n)+ 1.第一位给出0到n-1(含)的值,然后加1以获得所需的范围。

> Is it right or normal if the memory for the seed is dynamically allocated?

>如果种子的内存是动态分配的,那是正确还是正常?

The seed has to be persistent as long as you're using it. You could dynamically allocate it in the thread but you could also declare it in the thread's top-level function. In both those cases, you'll need to communicate the address down to the lower levels somehow (unless your thread is just that one function which is unlikely).

只要你使用它,种子必须是持久的。您可以在线程中动态分配它,但您也可以在线程的*函数中声明它。在这两种情况下,您都需要以某种方式将地址传递到较低级别(除非您的线程只是一个不太可能的函数)。

You could either pass it down through the function calls or set up a global array somehow where the lower levels can discover the correct seed address.

您可以通过函数调用将其传递下来,或者以某种方式设置全局数组,其中较低级别可以发现正确的种子地址。

Alternatively, since you need a global array anyway, you can have a global array of seeds rather than seed addresses, which the lower levels could use to discover their seed.

或者,既然你需要一个全局数组,你可以拥有一个全局数组的种子而不是种子地址,较低级别可以用来发现它们的种子。

You would probably (in both cases of using the global array) have a keyed structure containing the thread ID as a key and the seed to use. You would then have to write your own rand() routine which located the correct seed and called rand_r() with that.

您可能(在使用全局数组的两种情况下)都有一个键控结构,其中包含作为键的线程ID和要使用的种子。然后,您必须编写自己的rand()例程,该例程找到正确的种子并使用它调用rand_r()。

This is why I prefer library routines which do this under the covers with thread-specific data.

这就是为什么我更喜欢使用特定于线程的数据执行此操作的库例程。

#1


15  

That's correct. What you're doing in the first case is bypassing the thread-safety nature of rand_r. With many non-thread-safe functions, persistent state is stored between calls to that function (such as the random seed here).

那是对的。你在第一种情况下所做的是绕过rand_r的线程安全性。对于许多非线程安全函数,持久状态存储在对该函数的调用之间(例如此处的随机种子)。

With the thread-safe variant, you actually provide a thread-specific piece of data (seed1 and seed2) to ensure the state is not shared between threads.

使用线程安全的变体,您实际上提供了一个特定于线程的数据(seed1和seed2),以确保线程之间不共享状态。

Keep in mind that this doesn't make the numbers truly random, it just makes the sequences independent of each other. If you start them with the same seed, you'll probably get the same sequence in both threads.

请记住,这并不会使数字真正随机,它只会使序列彼此独立。如果你用相同的种子启动它们,你可能会在两个线程中得到相同的序列。

By way of example, let's say you get a random sequence 2, 3, 5, 7, 11, 13, 17 given an initial seed of 0. With a shared seed, alternating calls to rand_r from two different threads would cause this:

举例来说,假设你得到一个随机序列2,3,5,7,11,13,17,给定初始种子为0.对于共享种子,从两个不同的线程交替调用rand_r会导致:

thread 1                thread 2
           <---  2
                 3 --->
           <---  5
                 7 --->
           <--- 11
                13 --->
           <--- 17

and that's the best case - you may actually find that the shared state gets corrupted since the updates on it may not be atomic.

这是最好的情况 - 你可能会发现共享状态被破坏,因为它的更新可能不是原子的。

With non-shared state (with a and b representing the two different sources of the random numbers):

使用非共享状态(a和b代表随机数的两个不同来源):

thread 1                thread 2
           <---  2a
                 2b --->
           <---  3a
                 3b --->
           <---  5a
                 5b --->
                 ::

Some thread-safe calls require you to provide the thread-specific state like this, others can create thread-specific data under the covers (using a thread ID or similar information) so that you never need to worry about it, and you can use exactly the same source code in threaded and non-threaded environments. I prefer the latter myself, simply because it makes my life easier.

一些线程安全调用要求您提供这样的特定于线程的状态,其他人可以在封面下创建特定于线程的数据(使用线程ID或类似信息),这样您就不必担心它,并且您可以使用在线程和非线程环境中完全相同的源代码。我自己更喜欢后者,仅仅是因为它让我的生活更轻松。


Additional stuff for edited question:

编辑问题的其他内容:

> If in thread 1, I need a random number between 1 to n, should I do '(rand_r(&seed1) % (n-1)) + 1', or there is other common way of doing this?

>如果在第1个主题中,我需要1到n之间的随机数,我应该'(rand_r(&seed1)%(n-1))+ 1',还是有其他常见方法可以做到这一点?

Assuming you want a value between 1 and n inclusive, use (rand_r(&seed1) % n) + 1. The first bit gives you a value from 0 to n-1 inclusive, then you add 1 to get the desired range.

假设您想要一个介于1和n之间的值,请使用(rand_r(&seed1)%n)+ 1.第一位给出0到n-1(含)的值,然后加1以获得所需的范围。

> Is it right or normal if the memory for the seed is dynamically allocated?

>如果种子的内存是动态分配的,那是正确还是正常?

The seed has to be persistent as long as you're using it. You could dynamically allocate it in the thread but you could also declare it in the thread's top-level function. In both those cases, you'll need to communicate the address down to the lower levels somehow (unless your thread is just that one function which is unlikely).

只要你使用它,种子必须是持久的。您可以在线程中动态分配它,但您也可以在线程的*函数中声明它。在这两种情况下,您都需要以某种方式将地址传递到较低级别(除非您的线程只是一个不太可能的函数)。

You could either pass it down through the function calls or set up a global array somehow where the lower levels can discover the correct seed address.

您可以通过函数调用将其传递下来,或者以某种方式设置全局数组,其中较低级别可以发现正确的种子地址。

Alternatively, since you need a global array anyway, you can have a global array of seeds rather than seed addresses, which the lower levels could use to discover their seed.

或者,既然你需要一个全局数组,你可以拥有一个全局数组的种子而不是种子地址,较低级别可以用来发现它们的种子。

You would probably (in both cases of using the global array) have a keyed structure containing the thread ID as a key and the seed to use. You would then have to write your own rand() routine which located the correct seed and called rand_r() with that.

您可能(在使用全局数组的两种情况下)都有一个键控结构,其中包含作为键的线程ID和要使用的种子。然后,您必须编写自己的rand()例程,该例程找到正确的种子并使用它调用rand_r()。

This is why I prefer library routines which do this under the covers with thread-specific data.

这就是为什么我更喜欢使用特定于线程的数据执行此操作的库例程。