阅读C#中的介绍 - 如何防范它?

时间:2021-06-15 21:01:42

An article in MSDN Magazine discusses the notion of Read Introduction and gives a code sample which can be broken by it.

MSDN杂志中的一篇文章讨论了Read Introduction的概念,并给出了一个可以被它破坏的代码示例。

public class ReadIntro {
  private Object _obj = new Object();
  void PrintObj() {
    Object obj = _obj;
    if (obj != null) {
      Console.WriteLine(obj.ToString()); // May throw a NullReferenceException
    }
  }
  void Uninitialize() {
    _obj = null;
  }
}

Notice this "May throw a NullReferenceException" comment - I never knew this was possible.

注意这个“可能抛出NullReferenceException”的注释 - 我从来不知道这是可能的。

So my question is: how can I protect against read introduction?

所以我的问题是:我如何防止阅读介绍?

I would also be really grateful for an explanation exactly when the compiler decides to introduce reads, because the article doesn't include it.

我也非常感谢编译器决定引入读取时的确切解释,因为该文章不包括它。

3 个解决方案

#1


17  

Let me try to clarify this complicated question by breaking it down.

让我试着通过分解来澄清这个复杂的问题。

What is "read introduction"?

什么是“阅读介绍”?

"Read introduction" is an optimization whereby the code:

“阅读介绍”是一种优化代码:

public static Foo foo; // I can be changed on another thread!
void DoBar() {
  Foo fooLocal = foo;
  if (fooLocal != null) fooLocal.Bar();
}

is optimized by eliminating the local variable. The compiler can reason that if there is only one thread then foo and fooLocal are the same thing. The compiler is explicitly permitted to make any optimization that would be invisible on a single thread, even if it becomes visible in a multithreaded scenario. The compiler is therefore permitted to rewrite this as:

通过消除局部变量来优化。编译器可以推断,如果只有一个线程,则foo和fooLocal是相同的。显式允许编译器进行在单个线程上不可见的任何优化,即使它在多线程场景中变得可见。因此,允许编译器将其重写为:

void DoBar() {
  if (foo != null) foo.Bar();
}

And now there is a race condition. If foo turns from non-null to null after the check then it is possible that foo is read a second time, and the second time it could be null, which would then crash. From the perspective of the person diagnosing the crash dump this would be completely mysterious.

现在有一个竞争条件。如果在检查之后foo从非null变为null,则可能第二次读取foo,第二次可能为null,然后崩溃。从诊断碰撞堆的人的角度来看,这将是完全神秘的。

Can this actually happen?

这真的可以发生吗?

As the article you linked to called out:

正如您链接的文章所说:

Note that you won’t be able to reproduce the NullReferenceException using this code sample in the .NET Framework 4.5 on x86-x64. Read introduction is very difficult to reproduce in the .NET Framework 4.5, but it does nevertheless occur in certain special circumstances.

请注意,您将无法使用x86-x64上的.NET Framework 4.5中的此代码示例重现NullReferenceException。在.NET Framework 4.5中,读取介绍很难再现,但在某些特殊情况下仍然会发生。

x86/x64 chips have a "strong" memory model and the jit compilers are not aggressive in this area; they will not do this optimization.

x86 / x64芯片具有“强大”的内存模型,而jit编译器在这方面并不具有攻击性;他们不会做这个优化。

If you happen to be running your code on a weak memory model processor, like an ARM chip, then all bets are off.

如果你碰巧在弱内存模型处理器(如ARM芯片)上运行代码,那么所有的赌注都会关闭。

When you say "the compiler" which compiler do you mean?

当你说“编译器”时你指的是哪个编译器?

I mean the jit compiler. The C# compiler never introduces reads in this manner. (It is permitted to, but in practice it never does.)

我的意思是jit编译器。 C#编译器从不以这种方式引入读取。 (这是允许的,但在实践中它永远不会。)

Isn't it a bad practice to be sharing memory between threads without memory barriers?

在没有内存障碍的线程之间共享内存不是一个坏习惯吗?

Yes. Something should be done here to introduce a memory barrier because the value of foo could already be a stale cached value in the processor cache. My preference for introducing a memory barrier is to use a lock. You could also make the field volatile, or use VolatileRead, or use one of the Interlocked methods. All of those introduce a memory barrier. (volatile introduces only a "half fence" FYI.)

是。应该在这里做一些事情以引入内存屏障,因为foo的值可能已经是处理器缓存中过时的缓存值。我对引入内存屏障的偏好是使用锁。您还可以使字段变为volatile,或使用VolatileRead,或使用其中一种Interlocked方法。所有这些都引入了记忆障碍。 (volatile只引入了“半围栏”FYI。)

Just because there's a memory barrier does not necessarily mean that read introduction optimizations are not performed. However, the jitter is far less aggressive about pursuing optimizations that affect code that contains a memory barrier.

仅仅因为存在内存障碍并不一定意味着不执行读取引入优化。但是,对于追求影响包含内存屏障的代码的优化,抖动远没有那么积极。

Are there other dangers to this pattern?

这种模式还有其他危险吗?

Sure! Let's suppose there are no read introductions. You still have a race condition. What if another thread sets foo to null after the check, and also modifies global state that Bar is going to consume? Now you have two threads, one of which believes that foo is not null and the global state is OK for a call to Bar, and another thread which believes the opposite, and you're running Bar. This is a recipe for disaster.

当然!我们假设没有阅读介绍。你还有竞争条件。如果另一个线程在检查后将foo设置为null,并且还修改了Bar将要使用的全局状态,该怎么办?现在你有两个线程,其中一个认为foo不是null,并且全局状态对于调用Bar是正常的,而另一个线程认为相反,并且你正在运行Bar。这是灾难的秘诀。

So what's the best practice here?

那么这里最好的做法是什么?

First, do not share memory across threads. This whole idea that there are two threads of control inside the main line of your program is just crazy to begin with. It never should have been a thing in the first place. Use threads as lightweight processes; give them an independent task to perform that does not interact with the memory of the main line of the program at all, and just use them to farm out computationally intensive work.

首先,不要跨线程共享内存。整个想法,你的程序的主线内有两个控制线程,开始时是疯狂的。它本来就不应该是一件事。使用线程作为轻量级进程;给他们一个独立的任务来执行,根本不与程序主线的内存交互,只需使用它们来解决计算密集型工作。

Second, if you are going to share memory across threads then use locks to serialize access to that memory. Locks are cheap if they are not contended, and if you have contention, then fix that problem. Low-lock and no-lock solutions are notoriously difficult to get right.

其次,如果要跨线程共享内存,则使用锁定序列化对该内存的访问。如果它们没有争用,那么锁是便宜的,如果你有争用,那么解决这个问题。众所周知,低锁和无锁解决方案很难做到。

Third, if you are going to share memory across threads then every single method you call that involves that shared memory must either be robust in the face of race conditions, or the races must be eliminated. That is a heavy burden to bear, and that is why you shouldn't go there in the first place.

第三,如果你要在线程之间共享内存,那么你所调用的涉及共享内存的每个方法必须在竞争条件下都是健壮的,或者必须消除竞争。这是一个沉重的负担,这就是为什么你不应该首先去那里。

My point is: read introductions are scary but frankly they are the least of your worries if you are writing code that blithely shares memory across threads. There are a thousand and one other things to worry about first.

我的观点是:阅读介绍是可怕的,但坦率地说,如果您编写的代码巧妙地在线程之间共享内存,那么它们是您最不担心的。首先要担心一千零一个其他事情。

#2


7  

You cant really "protect" against read introduction as it's a compiler optimization (excepting using Debug builds with no optimization of course). It's pretty well documented that the optimizer will maintain the single-threaded semantics of the function, which as the article notes can cause issues in multi-threaded situations.

你不能真正“保护”读取引入,因为它是一个编译器优化(除了使用Debug构建,当然没有优化)。值得记录的是,优化器将维护函数的单线程语义,正如本文所述,这可能会导致多线程情况下出现问题。

That said, I'm confused by his example. In Jeffrey Richter's book CLR via C# (v3 in this case), in the Events section he covers this pattern, and notes that in the example snippet you have above, in THEORY it wouldn't work. But, it was a recommended pattern by Microsoft early in .Net's existence, and therefore the JIT compiler people he spoke to said that they would have to make sure that sort of snippet never breaks. (It's always possible they may decide that it's worth breaking for some reason though - I imagine Eric Lippert could shed light on that).

那就是说,我对他的榜样感到困惑。在Jeffrey Richter的书籍CLR via C#(本案例中为v3)中,他在事件部分介绍了这种模式,并注意到在上面的示例代码段中,在理论中它不起作用。但是,这是微软早期在.Net存在时所推荐的模式,因此他所采访的JIT编译人员说,他们必须确保那种片段永远不会中断。 (总有可能他们可能会因为某种原因决定它值得打破 - 我想Eric Lippert可以说明这一点)。

Finally, unlike the article, Jeffrey offers the "proper" way to handle this in multi-threaded situations (I've modified his example with your sample code):

最后,与文章不同,Jeffrey提供了在多线程情况下处理这种情况的“正确”方法(我用示例代码修改了他的示例):

Object temp = Interlocked.CompareExchange(ref _obj, null, null);
if(temp != null)
{
    Console.WriteLine(temp.ToString());
}

#3


1  

I only skimmed the article, but it seems that what the author is looking for is that you need to declare the _obj member as volatile.

我只浏览了这篇文章,但似乎作者正在寻找的是你需要将_obj成员声明为volatile。

#1


17  

Let me try to clarify this complicated question by breaking it down.

让我试着通过分解来澄清这个复杂的问题。

What is "read introduction"?

什么是“阅读介绍”?

"Read introduction" is an optimization whereby the code:

“阅读介绍”是一种优化代码:

public static Foo foo; // I can be changed on another thread!
void DoBar() {
  Foo fooLocal = foo;
  if (fooLocal != null) fooLocal.Bar();
}

is optimized by eliminating the local variable. The compiler can reason that if there is only one thread then foo and fooLocal are the same thing. The compiler is explicitly permitted to make any optimization that would be invisible on a single thread, even if it becomes visible in a multithreaded scenario. The compiler is therefore permitted to rewrite this as:

通过消除局部变量来优化。编译器可以推断,如果只有一个线程,则foo和fooLocal是相同的。显式允许编译器进行在单个线程上不可见的任何优化,即使它在多线程场景中变得可见。因此,允许编译器将其重写为:

void DoBar() {
  if (foo != null) foo.Bar();
}

And now there is a race condition. If foo turns from non-null to null after the check then it is possible that foo is read a second time, and the second time it could be null, which would then crash. From the perspective of the person diagnosing the crash dump this would be completely mysterious.

现在有一个竞争条件。如果在检查之后foo从非null变为null,则可能第二次读取foo,第二次可能为null,然后崩溃。从诊断碰撞堆的人的角度来看,这将是完全神秘的。

Can this actually happen?

这真的可以发生吗?

As the article you linked to called out:

正如您链接的文章所说:

Note that you won’t be able to reproduce the NullReferenceException using this code sample in the .NET Framework 4.5 on x86-x64. Read introduction is very difficult to reproduce in the .NET Framework 4.5, but it does nevertheless occur in certain special circumstances.

请注意,您将无法使用x86-x64上的.NET Framework 4.5中的此代码示例重现NullReferenceException。在.NET Framework 4.5中,读取介绍很难再现,但在某些特殊情况下仍然会发生。

x86/x64 chips have a "strong" memory model and the jit compilers are not aggressive in this area; they will not do this optimization.

x86 / x64芯片具有“强大”的内存模型,而jit编译器在这方面并不具有攻击性;他们不会做这个优化。

If you happen to be running your code on a weak memory model processor, like an ARM chip, then all bets are off.

如果你碰巧在弱内存模型处理器(如ARM芯片)上运行代码,那么所有的赌注都会关闭。

When you say "the compiler" which compiler do you mean?

当你说“编译器”时你指的是哪个编译器?

I mean the jit compiler. The C# compiler never introduces reads in this manner. (It is permitted to, but in practice it never does.)

我的意思是jit编译器。 C#编译器从不以这种方式引入读取。 (这是允许的,但在实践中它永远不会。)

Isn't it a bad practice to be sharing memory between threads without memory barriers?

在没有内存障碍的线程之间共享内存不是一个坏习惯吗?

Yes. Something should be done here to introduce a memory barrier because the value of foo could already be a stale cached value in the processor cache. My preference for introducing a memory barrier is to use a lock. You could also make the field volatile, or use VolatileRead, or use one of the Interlocked methods. All of those introduce a memory barrier. (volatile introduces only a "half fence" FYI.)

是。应该在这里做一些事情以引入内存屏障,因为foo的值可能已经是处理器缓存中过时的缓存值。我对引入内存屏障的偏好是使用锁。您还可以使字段变为volatile,或使用VolatileRead,或使用其中一种Interlocked方法。所有这些都引入了记忆障碍。 (volatile只引入了“半围栏”FYI。)

Just because there's a memory barrier does not necessarily mean that read introduction optimizations are not performed. However, the jitter is far less aggressive about pursuing optimizations that affect code that contains a memory barrier.

仅仅因为存在内存障碍并不一定意味着不执行读取引入优化。但是,对于追求影响包含内存屏障的代码的优化,抖动远没有那么积极。

Are there other dangers to this pattern?

这种模式还有其他危险吗?

Sure! Let's suppose there are no read introductions. You still have a race condition. What if another thread sets foo to null after the check, and also modifies global state that Bar is going to consume? Now you have two threads, one of which believes that foo is not null and the global state is OK for a call to Bar, and another thread which believes the opposite, and you're running Bar. This is a recipe for disaster.

当然!我们假设没有阅读介绍。你还有竞争条件。如果另一个线程在检查后将foo设置为null,并且还修改了Bar将要使用的全局状态,该怎么办?现在你有两个线程,其中一个认为foo不是null,并且全局状态对于调用Bar是正常的,而另一个线程认为相反,并且你正在运行Bar。这是灾难的秘诀。

So what's the best practice here?

那么这里最好的做法是什么?

First, do not share memory across threads. This whole idea that there are two threads of control inside the main line of your program is just crazy to begin with. It never should have been a thing in the first place. Use threads as lightweight processes; give them an independent task to perform that does not interact with the memory of the main line of the program at all, and just use them to farm out computationally intensive work.

首先,不要跨线程共享内存。整个想法,你的程序的主线内有两个控制线程,开始时是疯狂的。它本来就不应该是一件事。使用线程作为轻量级进程;给他们一个独立的任务来执行,根本不与程序主线的内存交互,只需使用它们来解决计算密集型工作。

Second, if you are going to share memory across threads then use locks to serialize access to that memory. Locks are cheap if they are not contended, and if you have contention, then fix that problem. Low-lock and no-lock solutions are notoriously difficult to get right.

其次,如果要跨线程共享内存,则使用锁定序列化对该内存的访问。如果它们没有争用,那么锁是便宜的,如果你有争用,那么解决这个问题。众所周知,低锁和无锁解决方案很难做到。

Third, if you are going to share memory across threads then every single method you call that involves that shared memory must either be robust in the face of race conditions, or the races must be eliminated. That is a heavy burden to bear, and that is why you shouldn't go there in the first place.

第三,如果你要在线程之间共享内存,那么你所调用的涉及共享内存的每个方法必须在竞争条件下都是健壮的,或者必须消除竞争。这是一个沉重的负担,这就是为什么你不应该首先去那里。

My point is: read introductions are scary but frankly they are the least of your worries if you are writing code that blithely shares memory across threads. There are a thousand and one other things to worry about first.

我的观点是:阅读介绍是可怕的,但坦率地说,如果您编写的代码巧妙地在线程之间共享内存,那么它们是您最不担心的。首先要担心一千零一个其他事情。

#2


7  

You cant really "protect" against read introduction as it's a compiler optimization (excepting using Debug builds with no optimization of course). It's pretty well documented that the optimizer will maintain the single-threaded semantics of the function, which as the article notes can cause issues in multi-threaded situations.

你不能真正“保护”读取引入,因为它是一个编译器优化(除了使用Debug构建,当然没有优化)。值得记录的是,优化器将维护函数的单线程语义,正如本文所述,这可能会导致多线程情况下出现问题。

That said, I'm confused by his example. In Jeffrey Richter's book CLR via C# (v3 in this case), in the Events section he covers this pattern, and notes that in the example snippet you have above, in THEORY it wouldn't work. But, it was a recommended pattern by Microsoft early in .Net's existence, and therefore the JIT compiler people he spoke to said that they would have to make sure that sort of snippet never breaks. (It's always possible they may decide that it's worth breaking for some reason though - I imagine Eric Lippert could shed light on that).

那就是说,我对他的榜样感到困惑。在Jeffrey Richter的书籍CLR via C#(本案例中为v3)中,他在事件部分介绍了这种模式,并注意到在上面的示例代码段中,在理论中它不起作用。但是,这是微软早期在.Net存在时所推荐的模式,因此他所采访的JIT编译人员说,他们必须确保那种片段永远不会中断。 (总有可能他们可能会因为某种原因决定它值得打破 - 我想Eric Lippert可以说明这一点)。

Finally, unlike the article, Jeffrey offers the "proper" way to handle this in multi-threaded situations (I've modified his example with your sample code):

最后,与文章不同,Jeffrey提供了在多线程情况下处理这种情况的“正确”方法(我用示例代码修改了他的示例):

Object temp = Interlocked.CompareExchange(ref _obj, null, null);
if(temp != null)
{
    Console.WriteLine(temp.ToString());
}

#3


1  

I only skimmed the article, but it seems that what the author is looking for is that you need to declare the _obj member as volatile.

我只浏览了这篇文章,但似乎作者正在寻找的是你需要将_obj成员声明为volatile。