How does StringBuilder
work?
StringBuilder如何工作?
What does it do internally? Does it use unsafe code? And why is it so fast (compared to the +
operator)?
它在内部做什么?它使用不安全的代码吗?为什么它如此之快(与+运算符相比)?
4 个解决方案
#1
61
When you use the + operator to build up a string:
使用+运算符构建字符串时:
string s = "01";
s += "02";
s += "03";
s += "04";
then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.
然后在第一个连接中,我们创建一个长度为4的新字符串并将“01”和“02”复制到其中 - 复制了四个字符。在第二个连接中,我们创建一个长度为6的新字符串并将“0102”和“03”复制到其中 - 复制了六个字符。在第三个concat上,我们创建一个长度为8的字符串并将“010203”和“04”复制到其中 - 复制了8个字符。到目前为止,这个八个字符的字符串总共复制了4 + 6 + 8 = 18个字符。继续。
...
s += "99";
On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.
在第98次结束时,我们制作一个长度为198的字符串,并将“010203 ... 98”和“99”复制到其中。这给了我们总共4 + 6 + 8 + ... + 198 =很多,为了使这个198字符串。
A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.
字符串构建器不会完成所有复制。相反,它维护一个希望比最终字符串更大的可变数组,并根据需要将新内容填充到数组中。
What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.
当猜测错误并且数组变满时会发生什么?有两种策略。在该框架的先前版本中,字符串构建器在数组已满时重新分配并复制该数组,并将其大小加倍。在新实现中,字符串构建器维护一个相对较小的数组的链接列表,并在旧的数组变满时将新数组附加到列表的末尾。
Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.
此外,正如您所推测的那样,字符串构建器可以使用“不安全”代码来提高其性能。例如,将新数据写入数组的代码已经检查过数组写入是否在边界内。通过关闭安全系统,它可以避免每次写入检查抖动可能会插入以验证每次写入阵列是否安全。字符串生成器执行许多这样的操作来执行诸如确保重用缓冲区而不是重新分配缓冲区,确保避免不必要的安全检查等等。除非你真的善于正确地编写不安全的代码,否则我建议不要使用这些恶作剧,而且确实需要摒弃每一个性能。
#2
14
StringBuilder
's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.
我相信StringBuilder的实现在版本之间已经发生了变化。从根本上说,它保持了某种形式的可变结构。我相信它曾经使用过一个仍在变异的字符串(使用内部方法),并确保它在返回后永远不会发生变异。
The reason StringBuilder
is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.
StringBuilder比在循环中使用字符串连接更快的原因恰恰是因为它的可变性 - 它不需要在每次突变后构造一个新字符串,这意味着要复制字符串中的所有数据等。
For just a single concatenation, it's actually slightly more efficient to use +
than to use StringBuilder
. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder
shines.
对于单个连接,使用+实际上比使用StringBuilder更有效。只有当您执行多个操作并且您并不真正需要StringBuilder闪耀的中间结果时。
See my article on StringBuilder
for more information.
有关更多信息,请参阅我在StringBuilder上的文章。
#3
3
The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). The biggest performance benefit over a bunch of +
concatenated strings is that it writes to a char[]
and doesn't create as many intermediate strings. When you call ToString (), it builds a completed, immutable string from your contents.
Microsoft CLR确实使用内部调用执行某些操作(与不安全的代码不完全相同)。与一串+串联字符串相比,最大的性能优势是它写入char []并且不会创建尽可能多的中间字符串。当您调用ToString()时,它会根据您的内容构建一个完整的,不可变的字符串。
#4
1
The StringBuilder
uses a string buffer that can be altered, compared to a regular String
that can't be. When you call the ToString
method of the StringBuilder
it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.
与不能使用的常规字符串相比,StringBuilder使用可以更改的字符串缓冲区。当你调用StringBuilder的ToString方法时,它只会冻结字符串缓冲区并将其转换为常规字符串,因此它不必再多次复制所有数据。
As the StringBuilder
can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data. When you use the +
operator, the compiler turns that into a String.Concat
call that creates a new string object. This seemingly innocent piece of code:
由于StringBuilder可以更改字符串缓冲区,因此不必为字符串数据的每次更改创建新的字符串值。当您使用+运算符时,编译器将其转换为String.Concat调用,该调用将创建一个新的字符串对象。这段看似无辜的代码:
str += ",";
compiles into this:
编译成这个:
str = String.Concat(str, ",");
#1
61
When you use the + operator to build up a string:
使用+运算符构建字符串时:
string s = "01";
s += "02";
s += "03";
s += "04";
then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.
然后在第一个连接中,我们创建一个长度为4的新字符串并将“01”和“02”复制到其中 - 复制了四个字符。在第二个连接中,我们创建一个长度为6的新字符串并将“0102”和“03”复制到其中 - 复制了六个字符。在第三个concat上,我们创建一个长度为8的字符串并将“010203”和“04”复制到其中 - 复制了8个字符。到目前为止,这个八个字符的字符串总共复制了4 + 6 + 8 = 18个字符。继续。
...
s += "99";
On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.
在第98次结束时,我们制作一个长度为198的字符串,并将“010203 ... 98”和“99”复制到其中。这给了我们总共4 + 6 + 8 + ... + 198 =很多,为了使这个198字符串。
A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.
字符串构建器不会完成所有复制。相反,它维护一个希望比最终字符串更大的可变数组,并根据需要将新内容填充到数组中。
What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.
当猜测错误并且数组变满时会发生什么?有两种策略。在该框架的先前版本中,字符串构建器在数组已满时重新分配并复制该数组,并将其大小加倍。在新实现中,字符串构建器维护一个相对较小的数组的链接列表,并在旧的数组变满时将新数组附加到列表的末尾。
Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.
此外,正如您所推测的那样,字符串构建器可以使用“不安全”代码来提高其性能。例如,将新数据写入数组的代码已经检查过数组写入是否在边界内。通过关闭安全系统,它可以避免每次写入检查抖动可能会插入以验证每次写入阵列是否安全。字符串生成器执行许多这样的操作来执行诸如确保重用缓冲区而不是重新分配缓冲区,确保避免不必要的安全检查等等。除非你真的善于正确地编写不安全的代码,否则我建议不要使用这些恶作剧,而且确实需要摒弃每一个性能。
#2
14
StringBuilder
's implementation has changed between versions, I believe. Fundamentally though, it maintains a mutable structure of some form. I believe it used to use a string which was still being mutated (using internal methods) and would just make sure it would never be mutated after it was returned.
我相信StringBuilder的实现在版本之间已经发生了变化。从根本上说,它保持了某种形式的可变结构。我相信它曾经使用过一个仍在变异的字符串(使用内部方法),并确保它在返回后永远不会发生变异。
The reason StringBuilder
is faster than using string concatenation in a loop is precisely because of the mutability - it doesn't require a new string to be constructed after each mutation, which would mean copying all the data within the string etc.
StringBuilder比在循环中使用字符串连接更快的原因恰恰是因为它的可变性 - 它不需要在每次突变后构造一个新字符串,这意味着要复制字符串中的所有数据等。
For just a single concatenation, it's actually slightly more efficient to use +
than to use StringBuilder
. It's only when you're performing multiple operations and you don't really need the intermediate results that StringBuilder
shines.
对于单个连接,使用+实际上比使用StringBuilder更有效。只有当您执行多个操作并且您并不真正需要StringBuilder闪耀的中间结果时。
See my article on StringBuilder
for more information.
有关更多信息,请参阅我在StringBuilder上的文章。
#3
3
The Microsoft CLR does do some operations with internal call (not quite the same as unsafe code). The biggest performance benefit over a bunch of +
concatenated strings is that it writes to a char[]
and doesn't create as many intermediate strings. When you call ToString (), it builds a completed, immutable string from your contents.
Microsoft CLR确实使用内部调用执行某些操作(与不安全的代码不完全相同)。与一串+串联字符串相比,最大的性能优势是它写入char []并且不会创建尽可能多的中间字符串。当您调用ToString()时,它会根据您的内容构建一个完整的,不可变的字符串。
#4
1
The StringBuilder
uses a string buffer that can be altered, compared to a regular String
that can't be. When you call the ToString
method of the StringBuilder
it will just freeze the string buffer and convert it into a regular string, so it doesn't have to copy all the data one extra time.
与不能使用的常规字符串相比,StringBuilder使用可以更改的字符串缓冲区。当你调用StringBuilder的ToString方法时,它只会冻结字符串缓冲区并将其转换为常规字符串,因此它不必再多次复制所有数据。
As the StringBuilder
can alter the string buffer, it doesn't have to create a new string value for each and every change to the string data. When you use the +
operator, the compiler turns that into a String.Concat
call that creates a new string object. This seemingly innocent piece of code:
由于StringBuilder可以更改字符串缓冲区,因此不必为字符串数据的每次更改创建新的字符串值。当您使用+运算符时,编译器将其转换为String.Concat调用,该调用将创建一个新的字符串对象。这段看似无辜的代码:
str += ",";
compiles into this:
编译成这个:
str = String.Concat(str, ",");