we have a multi-threaded desktop application in C++ (MFC). Currently developers use either CString or std::string, probably depending on their mood. So we'd like to choose a single implementation (probably something other than those two).
我们有一个C ++(MFC)的多线程桌面应用程序。目前开发人员使用CString或std :: string,可能取决于他们的心情。所以我们想选择一个实现(可能不是那两个)。
MFC's CString is based on copy-on-write (COW) idiom, and some people would claim this is unacceptable in a multithreaded environment (and probably reference to this article). I am not convinced by such claims, as atomic counters seem to be quite fast, and also this overhead is somehow compensated by a reduction in memory re-allocations.
MFC的CString基于写时复制(COW)习惯,有些人会声称这在多线程环境中是不可接受的(并且可能参考了这篇文章)。我不相信这样的说法,因为原子计数器看起来非常快,并且这种开销在某种程度上可以通过减少内存重新分配来弥补。
I learned that std::string implementation depends on compiler - it is not COW in MSVC but it is, or was in gcc. As far as I understood, the new C++0x standard is going to fix this by requiring a non-COW implementation and resolve some other issues, such as contiguous buffer requirements. So actually std::string looks not well defined at this point...
我了解到std :: string实现依赖于编译器 - 它在MSVC中不是COW,但它是,或者是在gcc中。据我所知,新的C ++ 0x标准将通过要求非COW实现并解决一些其他问题(例如连续的缓冲区要求)来解决这个问题。所以实际上std :: string看起来没有明确定义......
A quick example of what I don't like about std::string: no way to return a string from a function without excessive re-allocations (copy constructor if return by value, and no access to internal buffer to optimize that so "return by reference" e.g. std::string& Result
doesn't help). I can do this with CString by either returning by value (no copy due to COW) or passing by reference and accessing the buffer directly. Again, C++0x to the rescue with its rvalue references, but we are not going to have C++0x in the nearest feature.
我不喜欢std :: string的一个简单例子:没有办法从函数返回一个字符串而没有过多的重新分配(复制构造函数如果按值返回,并且没有访问内部缓冲区来优化那么“返回通过引用“例如std :: string&Result没有帮助”。我可以使用CString执行此操作,方法是返回值(由于COW没有副本)或通过引用传递并直接访问缓冲区。再次,C ++ 0x使用其右值引用进行救援,但我们不会在最近的特征中使用C ++ 0x。
Which string class should we use? Can COW really become an issue? Are there other commonly used efficient implementations of strings? Thanks.
我们应该使用哪个字符串类? COW真的可以成为一个问题吗?是否有其他常用的字符串高效实现?谢谢。
EDIT: We don't use unicode at the moment, and it is unlikely that we will need it. However, if there is something easily supporting unicode (not at the cost of ICU...), that would be a plus.
编辑:我们目前不使用unicode,我们不太可能需要它。但是,如果有一些容易支持unicode的东西(不是以ICU为代价......),那将是一个加分。
7 个解决方案
#1
16
I would use std::string
.
我会用std :: string。
- Promote decoupling from MFC
- Better interaction with existing C++ libraries
促进与MFC的脱钩
更好地与现有C ++库交互
The "return by value" issue is mostly a non-issue. Compilers are very good at performing Return Value Optimization (RVO) which actually eliminates the copy in most cases when returning by value. If it doesn't, you can usually tweak the function.
“价值回归”问题大多不是问题。编译器非常擅长执行返回值优化(RVO),在大多数情况下,按值返回时实际上会删除副本。如果没有,你通常可以调整功能。
COW has been rejected for a reason: it doesn't scale (well) and the so-hoped-for increase in speed has not been really measured (see Herb Sutter's article). Atomic operations are not as cheap as they appear. With mono-processor mono-core it was easy, but now multi-core are commodity and multi-processors are widely available (for servers). In such distributed architectures there are multiple caches, that need be synchronized, and the more distributed the architecture, the more costly the atomic operations.
COW被拒绝是有原因的:它没有扩展(好),并且所谓的速度提升还没有真正测量过(参见Herb Sutter的文章)。原子操作并不像它们看起来那么便宜。使用单处理器单核很容易,但现在多核是商用和多处理器广泛可用(用于服务器)。在这种分布式体系结构中,存在多个需要同步的高速缓存,并且体系结构越分散,原子操作的成本越高。
Does CString
implement Small String Optimization ? It's a simple trick that allows a string not to allocate any memory for small strings (usually a few characters). Very useful because it turns out that most strings are in fact small, how many strings in your application are less than 8-characters long ?
CString是否实现了小字符串优化?这是一个简单的技巧,允许字符串不为小字符串(通常是几个字符)分配任何内存。非常有用,因为事实证明大多数字符串实际上很小,应用程序中有多少字符串长度小于8个字符?
So, unless you present me a real benchmark which clearly shows a net gain in using CString
, I'd prefer sticking with the standard: it's standard, and likely better optimized.
所以,除非你给我一个明确显示使用CString的净增益的真实基准,否则我更喜欢坚持标准:它是标准的,可能更好地优化。
#2
5
Actually, the answer may be "It depends". But, if you are using MFC, IMHO, CString usage would be better. Also, you can use CString with STL containers also. But, it will lead to another question, should I use stl containers or MFC containers with CString? Usage of CString will provide agility to your application for example in unicode conversions.
实际上,答案可能是“它取决于”。但是,如果你使用MFC,恕我直言,CString使用会更好。此外,您还可以将CString与STL容器一起使用。但是,如果我使用带有CString的stl容器或MFC容器,它将导致另一个问题?使用CString将为您的应用程序提供灵活性,例如在unicode转换中。
EDIT: Moreover, if you use WIN32 api calls, CString conversions will be easier.
编辑:此外,如果您使用WIN32 api调用,CString转换将更容易。
EDIT: CString has a GetBuffer() and regarding methods that allow you to modify buffer directly.
编辑:CString有一个GetBuffer()和关于允许您直接修改缓冲区的方法。
EDIT: I have used CString in our SQLite wrapper, and formatting CString is easier.
编辑:我在我们的SQLite包装器中使用了CString,并且格式化CString更容易。
bool RS::getString(int idx, CString& a_value) {
//bla bla
if(getDB()->getEncoding() == IDatabase::UTF8){
a_value.Format(_T("%s"), sqlite3_column_text(getCommand()->getStatement(), idx));
}else{
a_value.Format(_T("%s"), sqlite3_column_text16(getCommand()->getStatement(), idx));
}
return true;
}
#3
1
I don't know of any other common string implementations- they all suffer from the same language limitations in C++03. Either they offer something specific, like how the ICU components are great for Unicode, they're really old like CString is, or std::string trumps them.
我不知道任何其他常见的字符串实现 - 它们都受到C ++ 03中相同的语言限制。要么它们提供特定的东西,比如ICU组件对于Unicode非常有用,它们就像CString一样老了,或者std :: string胜过它们。
However, you can use the same technique that the MSVC9 SP1 STL uses- that is, "swaptimization", which is the most hilariously named optimization ever.
但是,您可以使用MSVC9 SP1 STL使用的相同技术 - 即“交换优化”,这是有史以来最热闹的优化。
void func(std::string& ref) {
std::string retval;
// ...
std::swap(ref, retval); // No copying done here.
}
If you rolled a custom string class that didn't allocate anything in it's default constructor (or checked your STL implementation), then swaptimizing it would guarantee no redundant allocations. For example, my MSVC STL uses SSO and doesn't allocate any heap memory by default, so by swaptimizing the above, I get no redundant allocations.
如果您滚动了一个自定义字符串类,该类没有在其默认构造函数中分配任何内容(或检查您的STL实现),那么交换它将保证没有冗余分配。例如,我的MSVC STL使用SSO并且默认情况下不分配任何堆内存,因此通过交换优化上面的内容,我得不到冗余分配。
You could improve performance substantially too by just not using expensive heap allocation. There are allocators designed for temporary allocations, and you can replace the allocator used in your favourite STL implementation with a custom one. You can get things like object pools from Boost or roll a memory arena. You can get tenfold better performance compared to a normal new allocation.
只需不使用昂贵的堆分配,您就可以大大提高性能。有一些用于临时分配的分配器,您可以将自己喜欢的STL实现中使用的分配器替换为自定义分配器。您可以从Boost获取对象池或滚动内存竞技场。与普通的新分配相比,您可以获得十倍的性能提升。
#4
1
I would suggest making a "per DLL" decision. If you have DLLs depending heavily on MFC (for example, your GUI layer), where you need a lot of MFC calls with CString
parameters, use CString
. If you have DLLs where the only thing from MFC you are going to use would be the CString class, use std::string
instead. Of course, you will need conversion function between both classes, but I suspect you have already solved that issue.
我建议做一个“每个DLL”的决定。如果您的DLL严重依赖于MFC(例如,您的GUI层),您需要使用CString参数进行大量MFC调用,请使用CString。如果你有DLL,你要使用的MFC中唯一的东西是CString类,请改用std :: string。当然,你需要在两个类之间进行转换功能,但我怀疑你已经解决了这个问题。
#5
1
I say always go for std::string
. As mentioned, RVO and NVRO will make returning by copies cheap, and when you do end up switching to C++0x eventually, you get a nice performance boost from move semantics, without doing anything. If you want to take any code and use it in a non-ATL/MFC project, you can't use CString, but std::string
will be there, so you'll have a much easier time. Finally, you mentioned in a comment you use STL containers instead of MFC containers (good move). Why not stay consistent and use STL string too?
我说总是去std :: string。如前所述,RVO和NVRO将以便宜的方式返回副本,当你最终切换到C ++ 0x时,你可以通过移动语义获得良好的性能提升,而无需做任何事情。如果你想获取任何代码并在非ATL / MFC项目中使用它,你不能使用CString,但是std :: string会在那里,所以你会有更轻松的时间。最后,您在评论中提到使用STL容器而不是MFC容器(良好的移动)。为什么不保持一致并使用STL字符串呢?
#6
0
I would advise using std::basic_string as your general string template base unless there is a good reason to do otherwise. I say basic_string because if you are handling 16-bit characters you would use wstring.
我建议使用std :: basic_string作为一般的字符串模板库,除非有充分的理由不这样做。我说basic_string,因为如果你正在处理16位字符,你会使用wstring。
If you are going to use TCHAR you should probably define tstring as basic_string and may wish to implement a traits class for it too to use functions like _tcslen etc.
如果你打算使用TCHAR,你应该将tstring定义为basic_string,并且可能希望为它实现一个traits类来使用像_tcslen等函数。
#7
-2
std::string
is usually reference counted, so pass-by-value is still a cheap operation (and even more so with the rvalue reference stuff in C++0x). The COW is triggered only for strings that have multiple references pointing to them, i.e.:
std :: string通常是引用计数,因此pass-by-value仍然是一个廉价的操作(对于C ++ 0x中的rvalue引用,更是如此)。 COW仅针对具有指向它们的多个引用的字符串触发,即:
std::string foo("foo");
std::string bar(foo);
foo[0] = 'm';
will go through the COW path. As the COW happens inside operator[]
, you can force a string to use a private buffer by using its (non-const) operator[]()
or begin()
methods.
将通过COW路径。由于COW发生在operator []内,您可以通过使用其(非const)运算符[]()或begin()方法强制字符串使用私有缓冲区。
#1
16
I would use std::string
.
我会用std :: string。
- Promote decoupling from MFC
- Better interaction with existing C++ libraries
促进与MFC的脱钩
更好地与现有C ++库交互
The "return by value" issue is mostly a non-issue. Compilers are very good at performing Return Value Optimization (RVO) which actually eliminates the copy in most cases when returning by value. If it doesn't, you can usually tweak the function.
“价值回归”问题大多不是问题。编译器非常擅长执行返回值优化(RVO),在大多数情况下,按值返回时实际上会删除副本。如果没有,你通常可以调整功能。
COW has been rejected for a reason: it doesn't scale (well) and the so-hoped-for increase in speed has not been really measured (see Herb Sutter's article). Atomic operations are not as cheap as they appear. With mono-processor mono-core it was easy, but now multi-core are commodity and multi-processors are widely available (for servers). In such distributed architectures there are multiple caches, that need be synchronized, and the more distributed the architecture, the more costly the atomic operations.
COW被拒绝是有原因的:它没有扩展(好),并且所谓的速度提升还没有真正测量过(参见Herb Sutter的文章)。原子操作并不像它们看起来那么便宜。使用单处理器单核很容易,但现在多核是商用和多处理器广泛可用(用于服务器)。在这种分布式体系结构中,存在多个需要同步的高速缓存,并且体系结构越分散,原子操作的成本越高。
Does CString
implement Small String Optimization ? It's a simple trick that allows a string not to allocate any memory for small strings (usually a few characters). Very useful because it turns out that most strings are in fact small, how many strings in your application are less than 8-characters long ?
CString是否实现了小字符串优化?这是一个简单的技巧,允许字符串不为小字符串(通常是几个字符)分配任何内存。非常有用,因为事实证明大多数字符串实际上很小,应用程序中有多少字符串长度小于8个字符?
So, unless you present me a real benchmark which clearly shows a net gain in using CString
, I'd prefer sticking with the standard: it's standard, and likely better optimized.
所以,除非你给我一个明确显示使用CString的净增益的真实基准,否则我更喜欢坚持标准:它是标准的,可能更好地优化。
#2
5
Actually, the answer may be "It depends". But, if you are using MFC, IMHO, CString usage would be better. Also, you can use CString with STL containers also. But, it will lead to another question, should I use stl containers or MFC containers with CString? Usage of CString will provide agility to your application for example in unicode conversions.
实际上,答案可能是“它取决于”。但是,如果你使用MFC,恕我直言,CString使用会更好。此外,您还可以将CString与STL容器一起使用。但是,如果我使用带有CString的stl容器或MFC容器,它将导致另一个问题?使用CString将为您的应用程序提供灵活性,例如在unicode转换中。
EDIT: Moreover, if you use WIN32 api calls, CString conversions will be easier.
编辑:此外,如果您使用WIN32 api调用,CString转换将更容易。
EDIT: CString has a GetBuffer() and regarding methods that allow you to modify buffer directly.
编辑:CString有一个GetBuffer()和关于允许您直接修改缓冲区的方法。
EDIT: I have used CString in our SQLite wrapper, and formatting CString is easier.
编辑:我在我们的SQLite包装器中使用了CString,并且格式化CString更容易。
bool RS::getString(int idx, CString& a_value) {
//bla bla
if(getDB()->getEncoding() == IDatabase::UTF8){
a_value.Format(_T("%s"), sqlite3_column_text(getCommand()->getStatement(), idx));
}else{
a_value.Format(_T("%s"), sqlite3_column_text16(getCommand()->getStatement(), idx));
}
return true;
}
#3
1
I don't know of any other common string implementations- they all suffer from the same language limitations in C++03. Either they offer something specific, like how the ICU components are great for Unicode, they're really old like CString is, or std::string trumps them.
我不知道任何其他常见的字符串实现 - 它们都受到C ++ 03中相同的语言限制。要么它们提供特定的东西,比如ICU组件对于Unicode非常有用,它们就像CString一样老了,或者std :: string胜过它们。
However, you can use the same technique that the MSVC9 SP1 STL uses- that is, "swaptimization", which is the most hilariously named optimization ever.
但是,您可以使用MSVC9 SP1 STL使用的相同技术 - 即“交换优化”,这是有史以来最热闹的优化。
void func(std::string& ref) {
std::string retval;
// ...
std::swap(ref, retval); // No copying done here.
}
If you rolled a custom string class that didn't allocate anything in it's default constructor (or checked your STL implementation), then swaptimizing it would guarantee no redundant allocations. For example, my MSVC STL uses SSO and doesn't allocate any heap memory by default, so by swaptimizing the above, I get no redundant allocations.
如果您滚动了一个自定义字符串类,该类没有在其默认构造函数中分配任何内容(或检查您的STL实现),那么交换它将保证没有冗余分配。例如,我的MSVC STL使用SSO并且默认情况下不分配任何堆内存,因此通过交换优化上面的内容,我得不到冗余分配。
You could improve performance substantially too by just not using expensive heap allocation. There are allocators designed for temporary allocations, and you can replace the allocator used in your favourite STL implementation with a custom one. You can get things like object pools from Boost or roll a memory arena. You can get tenfold better performance compared to a normal new allocation.
只需不使用昂贵的堆分配,您就可以大大提高性能。有一些用于临时分配的分配器,您可以将自己喜欢的STL实现中使用的分配器替换为自定义分配器。您可以从Boost获取对象池或滚动内存竞技场。与普通的新分配相比,您可以获得十倍的性能提升。
#4
1
I would suggest making a "per DLL" decision. If you have DLLs depending heavily on MFC (for example, your GUI layer), where you need a lot of MFC calls with CString
parameters, use CString
. If you have DLLs where the only thing from MFC you are going to use would be the CString class, use std::string
instead. Of course, you will need conversion function between both classes, but I suspect you have already solved that issue.
我建议做一个“每个DLL”的决定。如果您的DLL严重依赖于MFC(例如,您的GUI层),您需要使用CString参数进行大量MFC调用,请使用CString。如果你有DLL,你要使用的MFC中唯一的东西是CString类,请改用std :: string。当然,你需要在两个类之间进行转换功能,但我怀疑你已经解决了这个问题。
#5
1
I say always go for std::string
. As mentioned, RVO and NVRO will make returning by copies cheap, and when you do end up switching to C++0x eventually, you get a nice performance boost from move semantics, without doing anything. If you want to take any code and use it in a non-ATL/MFC project, you can't use CString, but std::string
will be there, so you'll have a much easier time. Finally, you mentioned in a comment you use STL containers instead of MFC containers (good move). Why not stay consistent and use STL string too?
我说总是去std :: string。如前所述,RVO和NVRO将以便宜的方式返回副本,当你最终切换到C ++ 0x时,你可以通过移动语义获得良好的性能提升,而无需做任何事情。如果你想获取任何代码并在非ATL / MFC项目中使用它,你不能使用CString,但是std :: string会在那里,所以你会有更轻松的时间。最后,您在评论中提到使用STL容器而不是MFC容器(良好的移动)。为什么不保持一致并使用STL字符串呢?
#6
0
I would advise using std::basic_string as your general string template base unless there is a good reason to do otherwise. I say basic_string because if you are handling 16-bit characters you would use wstring.
我建议使用std :: basic_string作为一般的字符串模板库,除非有充分的理由不这样做。我说basic_string,因为如果你正在处理16位字符,你会使用wstring。
If you are going to use TCHAR you should probably define tstring as basic_string and may wish to implement a traits class for it too to use functions like _tcslen etc.
如果你打算使用TCHAR,你应该将tstring定义为basic_string,并且可能希望为它实现一个traits类来使用像_tcslen等函数。
#7
-2
std::string
is usually reference counted, so pass-by-value is still a cheap operation (and even more so with the rvalue reference stuff in C++0x). The COW is triggered only for strings that have multiple references pointing to them, i.e.:
std :: string通常是引用计数,因此pass-by-value仍然是一个廉价的操作(对于C ++ 0x中的rvalue引用,更是如此)。 COW仅针对具有指向它们的多个引用的字符串触发,即:
std::string foo("foo");
std::string bar(foo);
foo[0] = 'm';
will go through the COW path. As the COW happens inside operator[]
, you can force a string to use a private buffer by using its (non-const) operator[]()
or begin()
methods.
将通过COW路径。由于COW发生在operator []内,您可以通过使用其(非const)运算符[]()或begin()方法强制字符串使用私有缓冲区。