为什么std::vector::data和std::string::data different?

时间:2022-06-25 16:35:13

Vector's new method data() provides a const and non-const version.
However string's data() method only provides a const version.

Vector的新方法data()提供了一个const和非const版本。但是string的data()方法只提供const版本。

I think they changed the wording about std::string so that the chars are now required to be contiguous (like std::vector).

我认为他们改变了关于std::string的措辞,这样chars现在就需要是连续的(比如std:::vector)。

Was std::string::data just missed? Or is the a good reason to only allow const access to a string's underlying characters?

std::string:数据只是错过了?还是仅仅允许const访问字符串的底层字符的好理由?

note: std::vector::data has another nice feature, it's not undefined behavior to call data() on an empty vector. Whereas &vec.front() is undefined behavior if it's empty.

注意:std:::vector::data还有一个很好的特性,在一个空的向量上调用data()不是没有定义的行为。然而,front()是没有定义的行为,如果它是空的。

4 个解决方案

#1


29  

In C++98/03 there was good reason to not have a non-const data() due to the fact that string was often implemented as COW. A non-const data() would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.

在c++ 98/03中,有充分的理由不使用非常量数据(),因为字符串通常被实现为COW。如果refcount大于1,则需要复制一个非const数据()。虽然可能,但在c++ 98/03中并没有被认为是可取的。

In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data() to vector, and added const and non-const at() to map. At that time, string had not been changed so as to outlaw COW. But later, by C++11, a COW string is no longer conforming. The string spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size()). In C++03, the terminating null was only guaranteed by the const overload of operator[].

2005年10月,委员会对LWG 464进行了投票,将const和non-const数据()添加到vector中,并将const和non-const at()添加到map中。当时还没有改弦,以取缔牛。但后来,在c++ 11中,牛字符串不再符合标准。在c++ 11中,字符串规范也得到了加强,因此要求它是连续的,而且操作符[](size())总是暴露一个终止null。在c++ 03中,只有操作符的const过载才能保证终止null[]。

So in short a non-const data() looks a lot more reasonable for a C++11 string. To the best of my knowledge, it was never proposed.

因此,对于c++ 11字符串来说,非const数据()看起来要合理得多。据我所知,它从未被提出过。

Update

更新

charT* data() noexcept;

was added basic_string in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.

在2016年2月杰克逊维尔会议上,David Sankel的P0272R1在c++ 1z工作草案N4582中添加了basic_string。

Nice job David!

大卫好工作!

#2


2  

Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.

从历史上看,字符串数据并不是const,因为它会阻止一些常见的优化,比如copy-on-write (COW)。这是现在,IIANM,不太常见,因为它在多线程程序中表现很差。

BTW, yes they are now required to be contiguous:

顺便说一下,它们现在被要求是连续的:

[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

[string.require]。5: basic_string对象中的charlike对象应该连续存储。也就是说,对于任何basic_string对象s, identity &*(s.begin() + n) = &*s.begin() + n应该包含所有n的值,以便0 <= n < s.size()。

Another reason might be to avoid code such as:

另一个原因可能是避免使用以下代码:

std::string ret;
strcpy(ret.data(), "whatthe...");

Or any other function that returns a preallocated char array.

或返回预分配字符数组的任何其他函数。

#3


1  

Although I'm not that well-versed in the standard, it might be due to the fact that std::string doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0' in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.

虽然我对标准不是很熟悉,但这可能是因为std::string不需要包含以null结尾的数据,但它可以,也不需要包含显式长度字段,但它可以。因此,更改undelying数据并在中间添加“\0”可能会使字符串长度字段与实际的char数据不同步,从而使对象处于无效状态。

#4


0  

@Christian Rau

@Christian劳

From the time the original Plauger (around 1995 I think) string class was STL-ized by the committee (turned into a Sequence, templatified), std::string has always been std::vector plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const begin()/end()/operator[]".

从最初的Plauger(我认为是在1995年左右)string类被委员会stl化(转换为序列,模板化)开始,std::string就一直被std::vector + string相关的东西(从/到0终止,连接,……),加上一些奇怪的东西,比如COW,实际上是“写上复制,非结束/开始操作”()

But ultimately a std::string is really a std::vector under another name, with a slightly different focus and intent. So:

但是最终的std::string实际上是另一个名称下的std::vector,其焦点和意图略有不同。所以:

  • just like std::vector, std::string has either a size data member or both start and end data members;
  • 就像std::vector, std::string有大小数据成员,也有开始和结束数据成员;
  • just like std::vector, std::string does not care about the value of its elements, embedded NUL or others.
  • 就像std::vector、std::string并不关心它的元素、嵌入的NUL或其他元素的值。

std::string is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T> is not T[] with syntax sugar, utility functions and some encapsulation.

string不是带有语法糖、实用函数和一些封装的C字符串,就像std::vector 不是带有语法糖、实用函数和一些封装的T[]。

#1


29  

In C++98/03 there was good reason to not have a non-const data() due to the fact that string was often implemented as COW. A non-const data() would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.

在c++ 98/03中,有充分的理由不使用非常量数据(),因为字符串通常被实现为COW。如果refcount大于1,则需要复制一个非const数据()。虽然可能,但在c++ 98/03中并没有被认为是可取的。

In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data() to vector, and added const and non-const at() to map. At that time, string had not been changed so as to outlaw COW. But later, by C++11, a COW string is no longer conforming. The string spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size()). In C++03, the terminating null was only guaranteed by the const overload of operator[].

2005年10月,委员会对LWG 464进行了投票,将const和non-const数据()添加到vector中,并将const和non-const at()添加到map中。当时还没有改弦,以取缔牛。但后来,在c++ 11中,牛字符串不再符合标准。在c++ 11中,字符串规范也得到了加强,因此要求它是连续的,而且操作符[](size())总是暴露一个终止null。在c++ 03中,只有操作符的const过载才能保证终止null[]。

So in short a non-const data() looks a lot more reasonable for a C++11 string. To the best of my knowledge, it was never proposed.

因此,对于c++ 11字符串来说,非const数据()看起来要合理得多。据我所知,它从未被提出过。

Update

更新

charT* data() noexcept;

was added basic_string in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.

在2016年2月杰克逊维尔会议上,David Sankel的P0272R1在c++ 1z工作草案N4582中添加了basic_string。

Nice job David!

大卫好工作!

#2


2  

Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.

从历史上看,字符串数据并不是const,因为它会阻止一些常见的优化,比如copy-on-write (COW)。这是现在,IIANM,不太常见,因为它在多线程程序中表现很差。

BTW, yes they are now required to be contiguous:

顺便说一下,它们现在被要求是连续的:

[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

[string.require]。5: basic_string对象中的charlike对象应该连续存储。也就是说,对于任何basic_string对象s, identity &*(s.begin() + n) = &*s.begin() + n应该包含所有n的值,以便0 <= n < s.size()。

Another reason might be to avoid code such as:

另一个原因可能是避免使用以下代码:

std::string ret;
strcpy(ret.data(), "whatthe...");

Or any other function that returns a preallocated char array.

或返回预分配字符数组的任何其他函数。

#3


1  

Although I'm not that well-versed in the standard, it might be due to the fact that std::string doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0' in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.

虽然我对标准不是很熟悉,但这可能是因为std::string不需要包含以null结尾的数据,但它可以,也不需要包含显式长度字段,但它可以。因此,更改undelying数据并在中间添加“\0”可能会使字符串长度字段与实际的char数据不同步,从而使对象处于无效状态。

#4


0  

@Christian Rau

@Christian劳

From the time the original Plauger (around 1995 I think) string class was STL-ized by the committee (turned into a Sequence, templatified), std::string has always been std::vector plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const begin()/end()/operator[]".

从最初的Plauger(我认为是在1995年左右)string类被委员会stl化(转换为序列,模板化)开始,std::string就一直被std::vector + string相关的东西(从/到0终止,连接,……),加上一些奇怪的东西,比如COW,实际上是“写上复制,非结束/开始操作”()

But ultimately a std::string is really a std::vector under another name, with a slightly different focus and intent. So:

但是最终的std::string实际上是另一个名称下的std::vector,其焦点和意图略有不同。所以:

  • just like std::vector, std::string has either a size data member or both start and end data members;
  • 就像std::vector, std::string有大小数据成员,也有开始和结束数据成员;
  • just like std::vector, std::string does not care about the value of its elements, embedded NUL or others.
  • 就像std::vector、std::string并不关心它的元素、嵌入的NUL或其他元素的值。

std::string is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T> is not T[] with syntax sugar, utility functions and some encapsulation.

string不是带有语法糖、实用函数和一些封装的C字符串,就像std::vector 不是带有语法糖、实用函数和一些封装的T[]。