为什么在。net中装箱是一种原始的值类型,不像Java?

时间:2021-04-09 16:31:13

Consider:

考虑:

int a = 42;

// Reference equality on two boxed ints with the same value
Console.WriteLine( (object)a == (object)a ); // False

// Same thing - listed only for clarity
Console.WriteLine(ReferenceEquals(a, a));  // False

Clearly, each boxing instruction allocates a separate instance of a boxed Int32, which is why reference-equality between them fails. This page appears to indicate that this is specified behaviour:

显然,每个装箱指令分配一个单独的装箱Int32实例,这就是它们之间的引用平等失败的原因。本页似乎表明这是指定的行为:

The box instruction converts the 'raw' (unboxed) value type into an object reference (type O). This is accomplished by creating a new object and copying the data from the value type into the newly allocated object.

box指令将“raw”(unbox)值类型转换为对象引用(类型O),这是通过创建一个新对象并将数据从值类型复制到新分配的对象来实现的。

But why does this have to be the case? Is there any compelling reason why the CLR does not choose to hold a "cache" of boxed Int32s, or even stronger, common values for all primitive value-types (which are all immutable)? I know Java has something like this.

但为什么一定要这样呢?为什么CLR不选择对所有原始值类型(都是不可变的)进行“缓存”,或者更强的、共同的值?我知道Java有类似的东西。

In the days of no-generics, wouldn't it have helped out a lot with reducing the memory requirements as well as GC workload for a large ArrayListconsisting mainly of small integers? I'm also sure that there exist several modern .NET applications that do use generics, but for whatever reason (reflection, interface assignments etc.), run up large boxing-allocations that could be massively reduced with (what appears to be) a simple optimization.

在非泛型的时代,对于主要由小整数组成的大型arraylistlist来说,减少内存需求和GC工作负载不是很有帮助吗?我也确信有几个现代的。net应用程序确实使用泛型,但是无论出于什么原因(反射、接口分配等等),都需要大量的箱-分配,这些分配可以通过(看起来是)简单的优化来大量减少。

So what's the reason? Some performance implication I haven't considered (I doubt if testing that the item is in the cache etc. will result in a net performance loss, but what do I know)? Implementation difficulties? Issues with unsafe code? Breaking backwards compatibility (I can't think of any good reason why a well-written program should rely on the existing behaviour)? Or something else?

那么原因是什么呢?一些我没有考虑到的性能影响(我怀疑测试项目是否在缓存中等等会导致净性能损失,但是我知道什么)?实现困难吗?问题不安全代码?破坏向后兼容性(我想不出一个写得好的程序为什么要依赖于现有的行为)?还是别的?

EDIT: What I was really suggesting was a static cache of "commonly-occurring" primitives, much like what Java does. For an example implementation, see Jon Skeet's answer. I understand that doing this for arbitrary, possibly mutable, value-types or dynamically "memoizing" instances at run-time is a completely different matter.

编辑:我真正想要的是“常见”原语的静态缓存,就像Java所做的那样。对于一个示例实现,请参见Jon Skeet的答案。我理解,在运行时为任意的、可能的可变的、值类型或动态的“memoizing”实例执行此操作是完全不同的事情。

EDIT: Changed title for clarity.

编辑:更改标题以保持清晰。

6 个解决方案

#1


11  

One reason which I find compelling is consistency. As you say, Java does cache boxed values in a certain range... which means it's all too easy to write code which works for a while:

我觉得引人注目的一个原因是一致性。正如您所说,Java确实在一定范围内缓存了盒装值……这意味着编写一段时间内有效的代码太容易了:

// Passes in all my tests. Shame it fails if they're > 127...
if (value1 == value2) {
    // Do something
}

I've been bitten by this - admittedly in a test rather than production code, fortunately, but it's still nasty to have something which changes behaviour significantly outside a given range.

幸运的是,我已经被这个问题困扰了——无可否认,在测试中而不是在产品代码中,但是在给定的范围之外,有一些东西会显著地改变行为,这仍然是令人不快的。

Don't forget that any conditional behaviour also incurs a cost on all boxing operations - so in cases where it wouldn't use the cache, you'd actually find that it was slower (because it would first have to check whether or not to use the cache).

不要忘记,任何条件行为都会导致所有装箱操作的成本——因此,在不使用缓存的情况下,您会发现它的速度更慢(因为它首先必须检查是否使用缓存)。

If you really want to write your own caching box operation, of course, you can do so:

当然,如果您真的想编写自己的缓存框操作,您可以这样做:

public static class Int32Extensions
{
    private static readonly object[] BoxedIntegers = CreateCache();

    private static object[] CreateCache()
    {
        object[] ret = new object[256];
        for (int i = -128; i < 128; i++)
        {
            ret[i + 128] = i;
        }
    }

    public object Box(this int i)
    {
        return (i >= -128 && i < 128) ? BoxedIntegers[i + 128] : (object) i;
    }
}

Then use it like this:

然后这样使用:

object y = 100.Box();
object z = 100.Box();

if (y == z)
{
    // Cache is working
}

#2


3  

I can't claim to be able to read minds, but here's a couple factors:

我不能说我能读懂别人的想法,但这里有几个因素:

1) caching the value types can make for unpredictability - comparing two boxed values that are equal could be true or false depending on cache hits and implementation. Ouch!

1)缓存值类型会导致不可预测性——根据缓存命中和实现,比较两个相等的框内值可能是对的,也可能是错的。哎哟!

2) The lifetime of a boxed value type is most likely short - so how long do you hold the value in cache? Now you either have a lot of cached values that will no longer be used, or you need to make the GC implementation more complicated to track the lifetime of cached value types.

2)框化值类型的生命周期很可能很短——那么在缓存中保存值需要多长时间?现在,您要么拥有大量不再使用的缓存值,要么需要使GC实现更复杂,以跟踪缓存值类型的生命周期。

With these downsides, what is the potential win? Smaller memory footprint in an application that does a lot of long-lived boxing of equal value types. Since this win is something that is going to affect a small number of applications and can be worked around by changing code, I'm going to agree with the c# spec writer's decisions here.

有了这些不利因素,潜在的胜利是什么?应用程序中较小的内存占用,该应用程序执行大量具有相同值类型的长寿命装箱。由于这个胜利会影响到少量的应用程序,并且可以通过修改代码来解决,所以我同意c# spec writer的决定。

#3


3  

Boxed value objects are not necessarily immutable. It is possible to change the value in a boxed value type, such as through an interface.

框值对象不一定是不可变的。可以通过接口来更改装箱值类型的值。

So if boxing a value type always returned the same instance based on the same original value, it would create references which may not be appropriate (for example, two different value type instances which happen to have the same value end up with the same reference even though they should not).

如果拳击一个值类型总是返回相同的实例基于相同的原始值,它将创建引用可能不合适(例如,两个不同的值类型实例发生在有相同的价值最终参考即使他们不应该)。

public interface IBoxed
{
    int X { get; set; }
    int Y { get; set; }
}

public struct BoxMe : IBoxed
{
    public int X { get; set; }

    public int Y { get; set; }
}

public static void Test()
{
    BoxMe original = new BoxMe()
                        {
                            X = 1,
                            Y = 2
                        };

    object boxed1 = (object) original;
    object boxed2 = (object) original;

    ((IBoxed) boxed1).X = 3;
    ((IBoxed) boxed1).Y = 4;

    Console.WriteLine("original.X = " + original.X);
    Console.WriteLine("original.Y = " + original.Y);
    Console.WriteLine("boxed1.X = " + ((IBoxed)boxed1).X);
    Console.WriteLine("boxed1.Y = " + ((IBoxed)boxed1).Y);
    Console.WriteLine("boxed2.X = " + ((IBoxed)boxed2).X);
    Console.WriteLine("boxed2.Y = " + ((IBoxed)boxed2).Y);
}

Produces this output:

产生该输出:

original.X = 1

原创。X = 1

original.Y = 2

原创。Y = 2

boxed1.X = 3

boxed1。X = 3

boxed1.Y = 4

boxed1。Y = 4

boxed2.X = 1

boxed2。X = 1

boxed2.Y = 2

boxed2。Y = 2

If boxing didn't create a new instance, then boxed1 and boxed2 would have the same values, which would be inappropriate if they were created from different original value type instance.

如果boxing没有创建新实例,则boxed1和boxed2将具有相同的值,如果它们是从不同的原始值类型实例创建的,那么这样做是不合适的。

#4


1  

There's an easy explanation for this: un/boxing is fast. It needed to be back in the .NET 1.x days. After the JIT compiler generates the machine code for it, there's but a handful of CPU instructions generated for it, all inline without method calls. Not counting corner cases like nullable types and large structs.

对此有一个简单的解释:un/boxing is fast。它需要回到。net 1。x天。在JIT编译器为它生成机器代码之后,仅为它生成了一些CPU指令,这些指令都是内联的,没有方法调用。不计算诸如可空类型和大型结构体之类的情况。

The effort of looking up a cached value would greatly diminish the speed of this code.

查找缓存值的工作将极大地降低这段代码的速度。

#5


0  

I wouldn't think a run-time-filled cache would be a good idea, but I would think it might be reasonable on 64-bit systems, to define ~8 billion of the 64 quintillion possible objects-reference values as being integer or float literals, and on any system pre-box all primitive literals. Testing whether the upper 31 bits of a reference type hold some value should probably be cheaper than a memory reference.

我不认为运行时缓存是一个好主意,但我认为在64位系统上,将64千万亿字节的可能对象定义为8亿是合理的——引用值为整型或浮点型,在任何系统上都预先将所有原始文本框化。测试引用类型的前31位是否包含某些值应该比内存引用便宜。

#6


0  

Adding to the answers already listed is the fact that in .net, at least with the normal garbage collector, object references are internally stored as direct pointers. This means that when a garbage collection is performed the system has to update every single reference to every object that gets moved, but it also means that "main-line" operation can be very fast. If object references were sometimes direct pointers and sometimes something else, this would require extra code every time an object is dereferenced. Since object dereferencing is one of the most common operations during the execution of a .net program, even a 5% slowdown here would be devastating unless it was matched by an awesome speedup. It's possible, for example, a "64-bit compact" model, in which each object reference was a 32-bit index into an object table, might offer better performance than the existing model in which each reference is a 64-bit direct pointer. Deferencing operations would require an extra table lookup, which would be bad, but object references would be smaller, thus allowing more of them to be stored in the cache at once. In some circumstances, that could be a major performance win (maybe often enough to be worthwhile--maybe not). It's unclear, though, that allowing an object reference to sometimes be a direct memory pointer and sometimes be something else would really offer much advantage.

添加到已经列出的答案的事实是,在。net中,至少对于普通的垃圾收集器来说,对象引用在内部存储为直接指针。这意味着当执行垃圾收集时,系统必须更新每个被移动的对象的引用,但这也意味着“主线”操作可以非常快。如果对象引用有时是直接指针,有时是其他东西,那么每次取消引用对象时都需要额外的代码。由于对象去引用是.net程序执行过程中最常见的操作之一,因此即使是5%的速度放缓也会造成毁灭性的破坏,除非它与惊人的加速速度相匹配。例如,可能有一个“64位紧凑”模型,其中每个对象引用都是一个32位的索引到一个对象表中,可能比现有的模型提供更好的性能,其中每个引用都是一个64位的直接指针。延迟操作将需要额外的表查找,这很糟糕,但是对象引用会更小,因此允许同时将更多的表存储在缓存中。在某些情况下,这可能是一个重大的性能胜利(可能经常足够值得——也许不值得)。但是,不清楚的是,允许对象引用有时是一个直接的内存指针,有时候是其他的东西确实会有很大的优势。

#1


11  

One reason which I find compelling is consistency. As you say, Java does cache boxed values in a certain range... which means it's all too easy to write code which works for a while:

我觉得引人注目的一个原因是一致性。正如您所说,Java确实在一定范围内缓存了盒装值……这意味着编写一段时间内有效的代码太容易了:

// Passes in all my tests. Shame it fails if they're > 127...
if (value1 == value2) {
    // Do something
}

I've been bitten by this - admittedly in a test rather than production code, fortunately, but it's still nasty to have something which changes behaviour significantly outside a given range.

幸运的是,我已经被这个问题困扰了——无可否认,在测试中而不是在产品代码中,但是在给定的范围之外,有一些东西会显著地改变行为,这仍然是令人不快的。

Don't forget that any conditional behaviour also incurs a cost on all boxing operations - so in cases where it wouldn't use the cache, you'd actually find that it was slower (because it would first have to check whether or not to use the cache).

不要忘记,任何条件行为都会导致所有装箱操作的成本——因此,在不使用缓存的情况下,您会发现它的速度更慢(因为它首先必须检查是否使用缓存)。

If you really want to write your own caching box operation, of course, you can do so:

当然,如果您真的想编写自己的缓存框操作,您可以这样做:

public static class Int32Extensions
{
    private static readonly object[] BoxedIntegers = CreateCache();

    private static object[] CreateCache()
    {
        object[] ret = new object[256];
        for (int i = -128; i < 128; i++)
        {
            ret[i + 128] = i;
        }
    }

    public object Box(this int i)
    {
        return (i >= -128 && i < 128) ? BoxedIntegers[i + 128] : (object) i;
    }
}

Then use it like this:

然后这样使用:

object y = 100.Box();
object z = 100.Box();

if (y == z)
{
    // Cache is working
}

#2


3  

I can't claim to be able to read minds, but here's a couple factors:

我不能说我能读懂别人的想法,但这里有几个因素:

1) caching the value types can make for unpredictability - comparing two boxed values that are equal could be true or false depending on cache hits and implementation. Ouch!

1)缓存值类型会导致不可预测性——根据缓存命中和实现,比较两个相等的框内值可能是对的,也可能是错的。哎哟!

2) The lifetime of a boxed value type is most likely short - so how long do you hold the value in cache? Now you either have a lot of cached values that will no longer be used, or you need to make the GC implementation more complicated to track the lifetime of cached value types.

2)框化值类型的生命周期很可能很短——那么在缓存中保存值需要多长时间?现在,您要么拥有大量不再使用的缓存值,要么需要使GC实现更复杂,以跟踪缓存值类型的生命周期。

With these downsides, what is the potential win? Smaller memory footprint in an application that does a lot of long-lived boxing of equal value types. Since this win is something that is going to affect a small number of applications and can be worked around by changing code, I'm going to agree with the c# spec writer's decisions here.

有了这些不利因素,潜在的胜利是什么?应用程序中较小的内存占用,该应用程序执行大量具有相同值类型的长寿命装箱。由于这个胜利会影响到少量的应用程序,并且可以通过修改代码来解决,所以我同意c# spec writer的决定。

#3


3  

Boxed value objects are not necessarily immutable. It is possible to change the value in a boxed value type, such as through an interface.

框值对象不一定是不可变的。可以通过接口来更改装箱值类型的值。

So if boxing a value type always returned the same instance based on the same original value, it would create references which may not be appropriate (for example, two different value type instances which happen to have the same value end up with the same reference even though they should not).

如果拳击一个值类型总是返回相同的实例基于相同的原始值,它将创建引用可能不合适(例如,两个不同的值类型实例发生在有相同的价值最终参考即使他们不应该)。

public interface IBoxed
{
    int X { get; set; }
    int Y { get; set; }
}

public struct BoxMe : IBoxed
{
    public int X { get; set; }

    public int Y { get; set; }
}

public static void Test()
{
    BoxMe original = new BoxMe()
                        {
                            X = 1,
                            Y = 2
                        };

    object boxed1 = (object) original;
    object boxed2 = (object) original;

    ((IBoxed) boxed1).X = 3;
    ((IBoxed) boxed1).Y = 4;

    Console.WriteLine("original.X = " + original.X);
    Console.WriteLine("original.Y = " + original.Y);
    Console.WriteLine("boxed1.X = " + ((IBoxed)boxed1).X);
    Console.WriteLine("boxed1.Y = " + ((IBoxed)boxed1).Y);
    Console.WriteLine("boxed2.X = " + ((IBoxed)boxed2).X);
    Console.WriteLine("boxed2.Y = " + ((IBoxed)boxed2).Y);
}

Produces this output:

产生该输出:

original.X = 1

原创。X = 1

original.Y = 2

原创。Y = 2

boxed1.X = 3

boxed1。X = 3

boxed1.Y = 4

boxed1。Y = 4

boxed2.X = 1

boxed2。X = 1

boxed2.Y = 2

boxed2。Y = 2

If boxing didn't create a new instance, then boxed1 and boxed2 would have the same values, which would be inappropriate if they were created from different original value type instance.

如果boxing没有创建新实例,则boxed1和boxed2将具有相同的值,如果它们是从不同的原始值类型实例创建的,那么这样做是不合适的。

#4


1  

There's an easy explanation for this: un/boxing is fast. It needed to be back in the .NET 1.x days. After the JIT compiler generates the machine code for it, there's but a handful of CPU instructions generated for it, all inline without method calls. Not counting corner cases like nullable types and large structs.

对此有一个简单的解释:un/boxing is fast。它需要回到。net 1。x天。在JIT编译器为它生成机器代码之后,仅为它生成了一些CPU指令,这些指令都是内联的,没有方法调用。不计算诸如可空类型和大型结构体之类的情况。

The effort of looking up a cached value would greatly diminish the speed of this code.

查找缓存值的工作将极大地降低这段代码的速度。

#5


0  

I wouldn't think a run-time-filled cache would be a good idea, but I would think it might be reasonable on 64-bit systems, to define ~8 billion of the 64 quintillion possible objects-reference values as being integer or float literals, and on any system pre-box all primitive literals. Testing whether the upper 31 bits of a reference type hold some value should probably be cheaper than a memory reference.

我不认为运行时缓存是一个好主意,但我认为在64位系统上,将64千万亿字节的可能对象定义为8亿是合理的——引用值为整型或浮点型,在任何系统上都预先将所有原始文本框化。测试引用类型的前31位是否包含某些值应该比内存引用便宜。

#6


0  

Adding to the answers already listed is the fact that in .net, at least with the normal garbage collector, object references are internally stored as direct pointers. This means that when a garbage collection is performed the system has to update every single reference to every object that gets moved, but it also means that "main-line" operation can be very fast. If object references were sometimes direct pointers and sometimes something else, this would require extra code every time an object is dereferenced. Since object dereferencing is one of the most common operations during the execution of a .net program, even a 5% slowdown here would be devastating unless it was matched by an awesome speedup. It's possible, for example, a "64-bit compact" model, in which each object reference was a 32-bit index into an object table, might offer better performance than the existing model in which each reference is a 64-bit direct pointer. Deferencing operations would require an extra table lookup, which would be bad, but object references would be smaller, thus allowing more of them to be stored in the cache at once. In some circumstances, that could be a major performance win (maybe often enough to be worthwhile--maybe not). It's unclear, though, that allowing an object reference to sometimes be a direct memory pointer and sometimes be something else would really offer much advantage.

添加到已经列出的答案的事实是,在。net中,至少对于普通的垃圾收集器来说,对象引用在内部存储为直接指针。这意味着当执行垃圾收集时,系统必须更新每个被移动的对象的引用,但这也意味着“主线”操作可以非常快。如果对象引用有时是直接指针,有时是其他东西,那么每次取消引用对象时都需要额外的代码。由于对象去引用是.net程序执行过程中最常见的操作之一,因此即使是5%的速度放缓也会造成毁灭性的破坏,除非它与惊人的加速速度相匹配。例如,可能有一个“64位紧凑”模型,其中每个对象引用都是一个32位的索引到一个对象表中,可能比现有的模型提供更好的性能,其中每个引用都是一个64位的直接指针。延迟操作将需要额外的表查找,这很糟糕,但是对象引用会更小,因此允许同时将更多的表存储在缓存中。在某些情况下,这可能是一个重大的性能胜利(可能经常足够值得——也许不值得)。但是,不清楚的是,允许对象引用有时是一个直接的内存指针,有时候是其他的东西确实会有很大的优势。