.NET字符串真的应该被认为是不可变的吗?

时间:2021-07-20 16:51:01

Consider the following code:

请考虑以下代码:

unsafe
{
    string foo = string.Copy("This can't change");

    fixed (char* ptr = foo)
    {
        char* pFoo = ptr;
        pFoo[8] = pFoo[9] = ' ';
    }

    Console.WriteLine(foo); // "This can   change"
}

This creates a pointer to the first character of foo, reassigns it to become mutable, and changes the chars 8 and 9 positions up to ' '.

这将创建一个指向foo的第一个字符的指针,将其重新分配为变为可变,并将字符8和9的位置更改为''。

Notice I never actually reassigned foo; instead, I changed its value by modifying its state, or mutating the string. Therefore, .NET strings are mutable.

注意我从未真正重新分配foo;相反,我通过修改其状态或改变字符串来改变其值。因此,.NET字符串是可变的。

This works so well, in fact, that the following code:

这样做效果很好,事实上,以下代码:

unsafe
{
    string bar = "Watch this";

    fixed (char* p = bar)
    {
        char* pBar = p;
        pBar[0] = 'C';
    }

    string baz = "Watch this";
    Console.WriteLine(baz); // Unrelated, right?
}

will print "Catch this" due to string literal interning.

由于字符串文字实习,将打印“Catch this”。

This has plenty of applicable uses, for example this:

这有很多适用的用途,例如:

string GetForInputData(byte[] inputData)
{
    // allocate a mutable buffer...
    char[] buffer = new char[inputData.Length];

    // fill the buffer with input data

    // ...and a string to return
    return new string(buffer);
}

gets replaced by:

被替换为:

string GetForInputData(byte[] inputData)
{
    // allocate a string to return
    string result = new string('\0', inputData.Length);

    fixed (char* ptr = result)
    {
        // fill the result with input data
    }

    return result; // return it
}

This could save potentially huge memory allocation / performance costs if you work in a speed-critical field (e.g. encodings).

如果您在速度关键领域(例如编码)工作,这可以节省潜在的巨大内存分配/性能成本。

I guess you could say that this doesn't count because it "uses a hack" to make pointers mutable, but then again it was the C# language designers who supported assigning a string to a pointer in the first place. (In fact, this is done all the time internally in String and StringBuilder, so technically you could make your own StringBuilder with this.)

我想你可以说这不算数,因为它“使用hack”来使指针变得可变,但是再一次是C#语言设计者支持首先将字符串赋值给指针。 (事实上​​,这是在String和StringBuilder内部一直进行的,所以从技术上讲,你可以使用它来制作自己的StringBuilder。)

So, should .NET strings really be considered immutable?

那么,.NET字符串真的应该被认为是不可变的吗?

2 个解决方案

#1


6  

§ 18.6 of the C# language specification (The fixed statement) specifically addresses the case of modifying a string through a fixed pointer, and indicates that doing so can result in undefined behavior:

§18#C#语言规范(固定语句)专门解决了通过固定指针修改字符串的情况,并指出这样做会导致未定义的行为:

Modifying objects of managed type through fixed pointers can results in undefined behavior. For example, because strings are immutable, it is the programmer’s responsibility to ensure that the characters referenced by a pointer to a fixed string are not modified.

通过固定指针修改托管类型的对象可能导致未定义的行为。例如,因为字符串是不可变的,所以程序员有责任确保不修改指向固定字符串的指针所引用的字符。

#2


1  

I just had to play with this and experiment to confirm whether the addresses of string literal are pointing into the same memory location.

我只需要玩这个并尝试确认字符串文字的地址是否指向同一个内存位置。

The results are:

结果是:

string foo = "Fix value?"; //New address: 0x02b215f8
string foo2 = "Fix value?"; //Points to same address: 0x02b215f8
string fooCopy = string.Copy(foo); //New address: 0x021b2888

fixed (char* p = foo)
{
    p[9] = '!';
}

Console.WriteLine(foo);
Console.WriteLine(foo2);
Console.WriteLine(fooCopy);

//Reference is equal, which means refering to same memory address
Console.WriteLine(string.ReferenceEquals(foo, foo2)); //true

//Reference is not equal, which creates another string in new memory address
Console.WriteLine(string.ReferenceEquals(foo, fooCopy)); //false

We see that foo initializes a string literal which points to 0x02b215f8 memory address in my PC. Assigning the same string literal to foo2 references the same memory address. And creating a copy of that same string literal makes a new one. Further testing via string.ReferenceEquals() reveals that they are indeed equal for foo and foo2 while different reference for foo and fooCopy.

我们看到foo初始化了一个字符串文字,它指向我PC中的0x02b215f8内存地址。为foo2分配相同的字符串文字引用相同的内存地址。并且创建相同字符串文字的副本会创建一个新文件。通过string.ReferenceEquals()进一步测试显示它们对于foo和foo2确实相同,而foo和fooCopy的引用不同。

It is interesting to see how string literals can be manipulated in memory and affects other variables that are just referencing it. One of the things that we should be careful of as this behavior exists.

有趣的是,如何在内存中操作字符串文字并影响仅引用它的其他变量。这种行为存在时我们应该注意的事情之一。

#1


6  

§ 18.6 of the C# language specification (The fixed statement) specifically addresses the case of modifying a string through a fixed pointer, and indicates that doing so can result in undefined behavior:

§18#C#语言规范(固定语句)专门解决了通过固定指针修改字符串的情况,并指出这样做会导致未定义的行为:

Modifying objects of managed type through fixed pointers can results in undefined behavior. For example, because strings are immutable, it is the programmer’s responsibility to ensure that the characters referenced by a pointer to a fixed string are not modified.

通过固定指针修改托管类型的对象可能导致未定义的行为。例如,因为字符串是不可变的,所以程序员有责任确保不修改指向固定字符串的指针所引用的字符。

#2


1  

I just had to play with this and experiment to confirm whether the addresses of string literal are pointing into the same memory location.

我只需要玩这个并尝试确认字符串文字的地址是否指向同一个内存位置。

The results are:

结果是:

string foo = "Fix value?"; //New address: 0x02b215f8
string foo2 = "Fix value?"; //Points to same address: 0x02b215f8
string fooCopy = string.Copy(foo); //New address: 0x021b2888

fixed (char* p = foo)
{
    p[9] = '!';
}

Console.WriteLine(foo);
Console.WriteLine(foo2);
Console.WriteLine(fooCopy);

//Reference is equal, which means refering to same memory address
Console.WriteLine(string.ReferenceEquals(foo, foo2)); //true

//Reference is not equal, which creates another string in new memory address
Console.WriteLine(string.ReferenceEquals(foo, fooCopy)); //false

We see that foo initializes a string literal which points to 0x02b215f8 memory address in my PC. Assigning the same string literal to foo2 references the same memory address. And creating a copy of that same string literal makes a new one. Further testing via string.ReferenceEquals() reveals that they are indeed equal for foo and foo2 while different reference for foo and fooCopy.

我们看到foo初始化了一个字符串文字,它指向我PC中的0x02b215f8内存地址。为foo2分配相同的字符串文字引用相同的内存地址。并且创建相同字符串文字的副本会创建一个新文件。通过string.ReferenceEquals()进一步测试显示它们对于foo和foo2确实相同,而foo和fooCopy的引用不同。

It is interesting to see how string literals can be manipulated in memory and affects other variables that are just referencing it. One of the things that we should be careful of as this behavior exists.

有趣的是,如何在内存中操作字符串文字并影响仅引用它的其他变量。这种行为存在时我们应该注意的事情之一。