Given that:
考虑到:
1) The C++03 standard does not address the existence of threads in any way
1) c++ 03标准没有以任何方式处理线程的存在
2) The C++03 standard leaves it up to implementations to decide whether std::string
should use Copy-on-Write semantics in its copy-constructor
2)c++ 03标准将其保留到实现以决定std::string应该在其复制构造函数中使用Copy-on-Write语义。
3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program
在多线程程序中,Copy-on-Write语义常常导致不可预知的行为。
I come to the following, seemingly controversial, conclusion:
我得出了以下看似有争议的结论:
You simply cannot safely and portably use std::string in a multi-threaded program
在多线程程序中使用std::string是不安全且可移植的
Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.
显然,没有STL数据结构是线程安全的。但至少,对于std:::vector,您可以使用互斥对象来保护对vector的访问。对于使用COW的std::string实现,如果不编辑供应商实现内部的引用计数语义,您甚至无法可靠地实现。
Real-world example:
现实世界的例子:
In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.
在我的公司中,我们有一个多线程应用程序,它经过了彻底的单元测试,并在Valgrind中运行了无数次。应用程序运行了几个月,没有任何问题。有一天,我在另一个版本的gcc上重新编译了应用程序,突然间我得到了随机的segfault。现在,在std::string copy构造函数中,Valgrind报告了对libstdc++ +的无效内存访问。
So what is the solution? Well, of course, I could typedef std::vector<char>
as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.
那么解决方案是什么呢?当然,我可以把std::vector
Am I right that std::string
simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?
std::字符串不能在可移植的多线程程序中可靠地使用,我说的对吗?什么是一个好的解决方案?
8 个解决方案
#1
4
Given that the standard doesn't say a word about memory models and is completely thread unaware, I'd say you can't definitely assume every implementation will be non-cow so no, you can't
考虑到标准中没有提到内存模型,而且完全没有线程意识,我想说您不能肯定地假设每个实现都是非母牛的,所以不,您不能
Apart from that, if you know your tools, most of the implementations will use non-cow strings to allow multi-threading.
除此之外,如果您了解您的工具,大多数实现将使用非母牛字符串来支持多线程。
#2
11
You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.
在多线程程序中,您无法安全地和可移植地执行任何操作。不存在可移植的多线程c++程序,正是因为线程抛出c++关于操作顺序和修改任何变量的结果的所有内容。
There's also nothing in the standard to guarantee that vector
can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.
标准中也没有任何东西可以保证向量可以像你说的那样使用。提供带有线程扩展的c++实现是合法的,例如,在线程外使用初始化的向量会导致未定义的行为。一旦启动第二个线程,您就不再使用标准的c++了,您必须向您的编译器供应商查询哪些是安全的,哪些是不安全的。
If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.
如果您的供应商提供了一个线程扩展,并且还提供了一个std::string with COW(因此)不能使其成为线程安全的,那么我认为目前您的参数是您的供应商,或者是线程扩展,而不是使用c++标准。例如,可以说POSIX应该在使用pthreads的程序中禁止牛字符串。
You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.
您可以通过拥有一个互斥对象(在进行任何字符串突变时都要使用这个互斥对象)和一个字符串的任何读取(这是复制的结果)来保证安全性。但是您可能会在互斥对象上产生严重的争用。
#3
8
You are right. This will be fixed in C++0x. For now you have to rely on your implementation's documentation. For example, recent libstdc++ Versions (GCC) lets you use string objects as if no string object shares its buffer with another one. C++0x forces a library implemetation to protect the user from "hidden sharing".
你是对的。这将被固定在c++ 0x中。现在,您必须依赖于您的实现文档。例如,最近的libstdc++版本(GCC)允许您使用字符串对象,就好像没有任何字符串对象与另一个字符串对象共享其缓冲区一样。c++ 0x强制一个库实现,以保护用户不被“隐藏共享”。
#4
4
A more correct way to look at it would be "You cannot safely and portably use C++ in a multithreaded environment". There is no guarantee that other data structures will behave sensibly either. Or that the runtime won't blow up your computer. The standard doesn't guarantee anything about threads.
更正确的看法应该是“您不能在多线程环境中安全地和可移植地使用c++”。也不能保证其他数据结构的行为是合理的。或者运行时不会炸掉你的电脑。该标准不保证任何关于线程的内容。
So to do anything with threads in C++, you have to rely on implementation-defined guarantees. And Then you can safely use std::string
because each implementation tells you whether or not it is safe to use in a threaded environment.
因此,要在c++中处理线程,您必须依赖于实现定义的保证。然后您可以安全地使用std::string,因为每个实现都告诉您在线程环境中使用它是否安全。
You lost all hope of true portability the moment you spawned a second thread. std::string
isn't "less portable" than the rest of the language/library.
在生成第二个线程的那一刻,您失去了真正的可移植性的所有希望。std::字符串并不比其他语言/库更“便携”。
#5
2
You can use STLport. It provides non-COW strings. And it has the same behavior on different platforms.
您可以使用STLport。它提供了non-COW字符串。在不同的平台上也有相同的行为。
This article presents comparison of STL strings with copy-on-write and noncopy- on-write argorithms, based on STLport strings, ropes and GNU libstdc++ implementations.
本文基于STLport字符串、rope和GNU libstdc+的实现,将STL字符串与写时复制和写时非复制的算法进行了比较。
In a company where I work I have some experience running the same server application built with STLport and without STLport on HP-UX 11.31. The application was compiled with gcc 4.3.1 with optimization level O2. So when I run the progrma built with STLport it processes requests 25% faster comparing to the the same program built without STLport (which uses gcc own STL library).
在我工作的公司里,我有一些经验,使用STLport和HP-UX 11.31上没有STLport构建的服务器应用程序。应用程序使用优化级别O2的gcc 4.3.1编译。因此,当我运行STLport构建的程序时,与没有STLport(使用gcc自己的STL库)的相同程序相比,它处理请求的速度要快25%。
I profiled both versions and found out that the version without STLport spends much more time in pthread_mutex_unlock()
(2.5%) comparing to the version with STLport (1%). And pthread_mutex_unlock()
itself in the version without STLport is called from one of std::string functions.
我分析了两个版本,发现没有STLport的版本在pthread_mutex_unlock()()(2.5%)方面要比STLport(1%)的版本花费更多的时间。在没有STLport的版本中,pthread_mutex_unlock()会从std: string函数中调用。
However, when after profiling I changed assignments to strings in most often called functions in this way:
然而,在分析之后,我将赋值改为字符串,以这种方式最常被称为函数:
string_var = string_var.c_str(); // added .c_str()
there was significant improvement in performance of the version without STLport.
在没有STLport的情况下,该版本的性能有了显著的提高。
#6
0
I regulate the string access:
我规范了字符串访问:
- make
std::string
members private - 使std::string私有成员
- return
const std::string&
for getters - 返回const std:: string&getters。
- setters modify the member
- setter修改成员
This has always worked fine for me and is correct data hiding.
这对我来说一直都很好,而且是正确的数据隐藏。
#7
0
In MSVC, std::string is no longer reference counted shared pointer to a container. They choose to the entire contents by-value in every copy constructor and assignment operator, to avoid multithreading problems.
在MSVC中,std::string不再是指向容器的引用计数共享指针。它们在每个复制构造函数和赋值操作符中按值选择整个内容,以避免多线程问题。
#8
0
If you want to disable COW semantics, you could force your strings to make copies:
如果您想禁用COW语义,您可以强制您的字符串进行复制:
// instead of:
string newString = oldString;
// do this:
string newString = oldString.c_str();
As pointed out, especially if you could have embedded nulls, then you should use the iterator ctor:
正如所指出的,特别是如果您可以嵌入nulls,那么您应该使用迭代器ctor:
string newString(oldString.begin(), oldString.end());
#1
4
Given that the standard doesn't say a word about memory models and is completely thread unaware, I'd say you can't definitely assume every implementation will be non-cow so no, you can't
考虑到标准中没有提到内存模型,而且完全没有线程意识,我想说您不能肯定地假设每个实现都是非母牛的,所以不,您不能
Apart from that, if you know your tools, most of the implementations will use non-cow strings to allow multi-threading.
除此之外,如果您了解您的工具,大多数实现将使用非母牛字符串来支持多线程。
#2
11
You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.
在多线程程序中,您无法安全地和可移植地执行任何操作。不存在可移植的多线程c++程序,正是因为线程抛出c++关于操作顺序和修改任何变量的结果的所有内容。
There's also nothing in the standard to guarantee that vector
can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.
标准中也没有任何东西可以保证向量可以像你说的那样使用。提供带有线程扩展的c++实现是合法的,例如,在线程外使用初始化的向量会导致未定义的行为。一旦启动第二个线程,您就不再使用标准的c++了,您必须向您的编译器供应商查询哪些是安全的,哪些是不安全的。
If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.
如果您的供应商提供了一个线程扩展,并且还提供了一个std::string with COW(因此)不能使其成为线程安全的,那么我认为目前您的参数是您的供应商,或者是线程扩展,而不是使用c++标准。例如,可以说POSIX应该在使用pthreads的程序中禁止牛字符串。
You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.
您可以通过拥有一个互斥对象(在进行任何字符串突变时都要使用这个互斥对象)和一个字符串的任何读取(这是复制的结果)来保证安全性。但是您可能会在互斥对象上产生严重的争用。
#3
8
You are right. This will be fixed in C++0x. For now you have to rely on your implementation's documentation. For example, recent libstdc++ Versions (GCC) lets you use string objects as if no string object shares its buffer with another one. C++0x forces a library implemetation to protect the user from "hidden sharing".
你是对的。这将被固定在c++ 0x中。现在,您必须依赖于您的实现文档。例如,最近的libstdc++版本(GCC)允许您使用字符串对象,就好像没有任何字符串对象与另一个字符串对象共享其缓冲区一样。c++ 0x强制一个库实现,以保护用户不被“隐藏共享”。
#4
4
A more correct way to look at it would be "You cannot safely and portably use C++ in a multithreaded environment". There is no guarantee that other data structures will behave sensibly either. Or that the runtime won't blow up your computer. The standard doesn't guarantee anything about threads.
更正确的看法应该是“您不能在多线程环境中安全地和可移植地使用c++”。也不能保证其他数据结构的行为是合理的。或者运行时不会炸掉你的电脑。该标准不保证任何关于线程的内容。
So to do anything with threads in C++, you have to rely on implementation-defined guarantees. And Then you can safely use std::string
because each implementation tells you whether or not it is safe to use in a threaded environment.
因此,要在c++中处理线程,您必须依赖于实现定义的保证。然后您可以安全地使用std::string,因为每个实现都告诉您在线程环境中使用它是否安全。
You lost all hope of true portability the moment you spawned a second thread. std::string
isn't "less portable" than the rest of the language/library.
在生成第二个线程的那一刻,您失去了真正的可移植性的所有希望。std::字符串并不比其他语言/库更“便携”。
#5
2
You can use STLport. It provides non-COW strings. And it has the same behavior on different platforms.
您可以使用STLport。它提供了non-COW字符串。在不同的平台上也有相同的行为。
This article presents comparison of STL strings with copy-on-write and noncopy- on-write argorithms, based on STLport strings, ropes and GNU libstdc++ implementations.
本文基于STLport字符串、rope和GNU libstdc+的实现,将STL字符串与写时复制和写时非复制的算法进行了比较。
In a company where I work I have some experience running the same server application built with STLport and without STLport on HP-UX 11.31. The application was compiled with gcc 4.3.1 with optimization level O2. So when I run the progrma built with STLport it processes requests 25% faster comparing to the the same program built without STLport (which uses gcc own STL library).
在我工作的公司里,我有一些经验,使用STLport和HP-UX 11.31上没有STLport构建的服务器应用程序。应用程序使用优化级别O2的gcc 4.3.1编译。因此,当我运行STLport构建的程序时,与没有STLport(使用gcc自己的STL库)的相同程序相比,它处理请求的速度要快25%。
I profiled both versions and found out that the version without STLport spends much more time in pthread_mutex_unlock()
(2.5%) comparing to the version with STLport (1%). And pthread_mutex_unlock()
itself in the version without STLport is called from one of std::string functions.
我分析了两个版本,发现没有STLport的版本在pthread_mutex_unlock()()(2.5%)方面要比STLport(1%)的版本花费更多的时间。在没有STLport的版本中,pthread_mutex_unlock()会从std: string函数中调用。
However, when after profiling I changed assignments to strings in most often called functions in this way:
然而,在分析之后,我将赋值改为字符串,以这种方式最常被称为函数:
string_var = string_var.c_str(); // added .c_str()
there was significant improvement in performance of the version without STLport.
在没有STLport的情况下,该版本的性能有了显著的提高。
#6
0
I regulate the string access:
我规范了字符串访问:
- make
std::string
members private - 使std::string私有成员
- return
const std::string&
for getters - 返回const std:: string&getters。
- setters modify the member
- setter修改成员
This has always worked fine for me and is correct data hiding.
这对我来说一直都很好,而且是正确的数据隐藏。
#7
0
In MSVC, std::string is no longer reference counted shared pointer to a container. They choose to the entire contents by-value in every copy constructor and assignment operator, to avoid multithreading problems.
在MSVC中,std::string不再是指向容器的引用计数共享指针。它们在每个复制构造函数和赋值操作符中按值选择整个内容,以避免多线程问题。
#8
0
If you want to disable COW semantics, you could force your strings to make copies:
如果您想禁用COW语义,您可以强制您的字符串进行复制:
// instead of:
string newString = oldString;
// do this:
string newString = oldString.c_str();
As pointed out, especially if you could have embedded nulls, then you should use the iterator ctor:
正如所指出的,特别是如果您可以嵌入nulls,那么您应该使用迭代器ctor:
string newString(oldString.begin(), oldString.end());