I'm having troubles pinpointing the exact source of either a race condition or memory corruption. My attempts to solve the problem are shown after the code.
我遇到了麻烦,指出了竞争条件或内存损坏的确切来源。我在代码后面显示了解决问题的尝试。
I have the following structure:
我有以下结构:
class A
{
protected:
// various variables
// 1. vector that is assigned value on B, C, D constructor and not
// modified while in thread
// 2. various ints
// 3. double array that is accessed by B, C, D
// here that are used by B, C and D
public:
virtual void execute() = 0;
};
class B : A
{
public:
B(...){};
bool isFinished();
void execute(); //execute does a very expensive loop (genetic algorithm)
}
class C : A
{
public:
C(...){};
bool isFinished();
void execute();
}
class D : A
{
public:
D(...){};
bool isFinished();
void execute();
}
class Worker
{
private:
A& m_a;
Container& m_parent;
public:
// Worker needs a reference to parent container to control a mutex
// in the sync version of this code (not shown here)
Worker(A& aa, Container& parent) : m_a(aa), m_parent(parent) {}
executeAsynchronous();
}
class Container
{
private:
std::vector<Worker> wVec;
public:
addWorker(Worker w); //this does wVec.push_back(w)
start();
}
void Worker::executeAsynchronous(){
while(!a.isFinished())
m_a.execute();
}
void Container::start(){
std::thread threads[3];
for (int i=0; i<wVec.size(); i++){
threads[i] = std::thread(&Worker::executeAsynchronous,
std::ref(wVec[i]));
}
for (int i=0; i<wVec.size(); i++){
threads[i].join();
}
}
To run the code, i'd do:
要运行代码,我会这样做:
Container container;
B b(...);
C c(...);
D d(...);
Worker worker1(b, container);
Worker worker2(c, container);
Worker worker3(d, container);
container.addWorker(worker1);
container.addWorker(worker2);
container.addWorker(worker3);
container.start();
The code is supposed to spawn threads to run execute()
asynchronously however I have the following 2 problems:
该代码应该产生线程以异步方式运行execute()但是我有以下两个问题:
-
One thread is faster than 2 or 3 or 4 threads AND has better results (better optimisation resulting from running the genetic algorithm in 1 thread), I have read that I could be limited by memory bandwidth but where is that happening? how can I verify that this is the case?
一个线程比2或3或4个线程更快并且具有更好的结果(通过在1个线程中运行遗传算法得到更好的优化),我已经读过我可以受到内存带宽的限制但是在哪里发生?我怎样才能验证是这种情况?
-
Two or more threads: the results become very bad, somehow something is getting corrupted or mangled along the way. However I cannot pinpoint it. I have
cout
ed from various locations in the code and each threads executes exactly one inherited class'sexecute()
i.e., each threads runs theexecute()
ofB, C or D
and doesn't jump or interfere with others. The moment I putm_parent.mutex.lock()
andm_parent.mutex.unlock()
arounda.execute();
effectively making the multi-threaded code single-threaded the results become correct again.两个或多个线程:结果变得非常糟糕,某种程度上某些东西正在被破坏或破坏。但是我无法确定它。我已经从代码中的各个位置进行了操作,并且每个线程只执行一个继承类的execute(),即每个线程运行B,C或D的execute()并且不跳转或干扰其他线程。我把m_parent.mutex.lock()和m_parent.mutex.unlock()放在a.execute()的那一刻;有效地使多线程代码单线程结果再次变得正确。
I have attempted to:
我试图:
- remove pointers in
B, C and D
that could become dangling after pushing theWorkers
back into theContainer
's vector. I now pass a copy topush_back
. - use
emplace_back
instead ofpush_back
but it made no difference - use
vector.reserve()
to avoid reallocation and loss of reference but no difference - use
std::ref()
because I discovered std::thread makes a copy and I want the elementwVec[i]
to be modified, previously i was just passingwVec[i]
to the thread.
删除B,C和D中的指针,这些指针在将Workers推回Container的向量后可能会变得悬空。我现在将副本传递给push_back。
使用emplace_back而不是push_back但它没有任何区别
使用vector.reserve()来避免重新分配和丢失引用但没有区别
使用std :: ref(),因为我发现std :: thread生成一个副本,我希望修改元素wVec [i],之前我只是将wVec [i]传递给线程。
I believe by doing 1-4 above and they made no difference and by running the code single-threaded and it works perfectly that it isn't a case of something going out of scope. Also there is no data exchange between threads or the container, I know std::vector
isn't thread-safe.
我相信通过做1-4以上,他们没有任何区别,并通过运行代码单线程,它完美的工作,它不是一个超出范围的事情。线程或容器之间也没有数据交换,我知道std :: vector不是线程安全的。
I'd appreciate if you'd take the time to help me figure this out.
如果你花时间帮助我解决这个问题,我将不胜感激。
EDIT1: As per Constantin Pan's notice, here is my RandomNumberGenerator class, it is a static class, i call it using RandomNumberGenerator::getDouble(a,b)
EDIT1:根据Constantin Pan的注意,这是我的RandomNumberGenerator类,它是一个静态类,我使用RandomNumberGenerator :: getDouble(a,b)调用它
//rng.h
class RandomNumberGenerator
{
private:
static std::mt19937 rng;
public:
static void initRNG();
static int getInt(int min, int max);
static double getDouble(double min, double max);
};
//rng.cpp
std::mt19937 RandomNumberGenerator::rng;
void RandomNumberGenerator::initRNG()
{
rng.seed(std::random_device()());
}
int RandomNumberGenerator::getInt(int min, int max)
{
std::uniform_int_distribution<std::mt19937::result_type> udist(min, max);
return udist(rng);
}
double RandomNumberGenerator::getDouble(double min, double max)
{
std::uniform_real_distribution<> udist(min, max);
return udist(rng);
}
EDIT2: I have solved the corruption problem. It was a call to a non-thread safe function that I have missed (the evaluation function). As for the slowness, the program is still slow when ran in the threads. I have ran valgrind's callgrind
and graphed the results using gprof2dot
and it appears M4rc's suggestion holds. There are a lot of STL container calls, I will attempt to dynamic allocate arrays instead.
EDIT2:我已经解决了腐败问题。这是对我错过的非线程安全函数的调用(评估函数)。至于速度慢,程序在线程中运行时仍然很慢。我已经运行了valgrind的callgrind并使用gprof2dot绘制结果,看起来M4rc的建议成立。有很多STL容器调用,我将尝试动态分配数组。
EDIT3: Looks like the RNG class was the culprit as Constantin Pan pointed out. Profiled using gprof
EDIT3:看起来RNG类是康斯坦丁潘指出的罪魁祸首。使用gprof描述
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.97 70.09 70.09 1734468 0.00 0.00 std::mersenne_twister_engine //SYNC
18.33 64.98 64.98 1803194 0.00 0.00 std::mersenne_twister_engine //ASYNC
6.19 63.41 8.93 1185214 0.00 0.00 std::mersenne_twister_engine //Single thread
EDIT4: Deque container was guilty too - M4rc
EDIT4:Deque容器也有罪--M4rc
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
14.15 28.60 28.60 799662660 0.00 0.00 std::_Deque_iterator
1 个解决方案
#1
0
Sice there is genetic algorithm involved, make sure that the random number generator is thread-safe. I have hit this (the slowdown and incorrect results) myself in the past with rand()
from cstdlib
.
Sice有遗传算法,确保随机数生成器是线程安全的。我在过去使用cstdlib的rand()来测试这个(减速和不正确的结果)。
#1
0
Sice there is genetic algorithm involved, make sure that the random number generator is thread-safe. I have hit this (the slowdown and incorrect results) myself in the past with rand()
from cstdlib
.
Sice有遗传算法,确保随机数生成器是线程安全的。我在过去使用cstdlib的rand()来测试这个(减速和不正确的结果)。