如何模拟此应用程序挂起方案？

I have a Windows Forms app that itself launches different threads to do different kinds of work. Occasionally, ALL threads (including the UI thread) become frozen, and my app becomes unresponsive. I've decided it may be a Garbage Collector-related issue, as the GC will freeze all managed threads temporarily. To verify that just managed threads are frozen, I spin up an unmanaged one that writes to a "heartbeat" file with a timestamp every second, and it is not affected (i.e. it still runs):

我有一个Windows窗体应用程序本身启动不同的线程来做不同类型的工作。有时,所有线程(包括UI线程)都会冻结,我的应用程序也会无法响应。我已经确定它可能是垃圾收集器相关的问题,因为GC会暂时冻结所有托管线程。为了验证只是托管线程被冻结,我启动了一个非托管的线程,每秒写入一个带有时间戳的“心跳”文件,它不受影响(即它仍然运行):

public delegate void ThreadProc();

[DllImport("UnmanagedTest.dll", EntryPoint = "MyUnmanagedFunction")]
public static extern void MyUnmanagedFunction();

[DllImport("kernel32")]
public static extern IntPtr CreateThread(
    IntPtr lpThreadAttributes,
    uint dwStackSize,
    IntPtr lpStartAddress,
    IntPtr lpParameter,
    uint dwCreationFlags,
    out uint dwThreadId);    

uint threadId;
ThreadProc proc = new ThreadProc(MyUnmanagedFunction);
IntPtr functionPointer = Marshal.GetFunctionPointerForDelegate(proc);
IntPtr threadHandle = CreateThread(IntPtr.Zero, 0, functionPointer, IntPtr.Zero, 0, out threadId);

My Question is: how can I simulate this situation, where all managed threads are suspended but unmanaged ones keep on spinning?

我的问题是:我如何模拟这种情况,其中所有托管线程都被挂起但未管理的线程继续旋转?

My first stab:

我的第一次刺:

private void button1_Click(object sender, EventArgs e) {
    Thread t = new Thread(new ThreadStart(delegate {
        new Hanger();
        GC.Collect(2, GCCollectionMode.Forced);
    }));
    t.Start();
}
class Hanger{
    private int[] m_Integers = new int[10000000];
    public Hanger() { }
    ~Hanger() { Console.WriteLine("About to hang...");

    //This doesn't reproduce the desired behavior
    //while (true) ;

    //Neither does this
    //Thread.Sleep(System.Threading.Timeout.Infinite); 
    }
}

Thanks in advance!!

提前致谢!!

4 个解决方案

#1

Finalizers are executed concurrently with "normal" thread execution. We usually say that the GC runs the finalizers, but it would be truer that the GC detects which instances have finalizers which should be run, and stores them in a dedicated queue. A (hidden) thread fetches the instances from the queue and runs the finalizers. Such asynchronism is needed, e.g. because the finalizers may themselves allocate memory and potentially trigger a GC. There are other good reasons why finalizers are necessarily asynchronous.

终结器与“正常”线程执行同时执行。我们通常说GC运行终结器,但GC检测哪些实例具有应该运行的终结器并将它们存储在专用队列中将更为真实。 (隐藏)线程从队列中提取实例并运行终结器。需要这种异步,例如,因为终结器本身可能会分配内存并可能触发GC。还有其他很好的理由说明终结者必然是异步的。

Bottom-line is that you cannot alter, from ~Hanger(), what the VM does during a GC pause, because the thread which will actually run ~Hanger() is also paused at that time.

底线是,你不能改变,从衣架〜(),有什么GC暂停期间,虚拟机确实,因为这会实际运行的线程〜衣架()也停在那个时候。

#2

I realize that this does not answer your question, but I suspect a deadlock in your code rather than a strange GC issue.

我意识到这不能回答你的问题,但我怀疑代码中存在死锁而不是奇怪的GC问题。

I would suggest to check your code for deadlocks, especially indirect cases like Control.Invoke calls when doing UI updates from background threads. Ensure that you are not holding a lock when calling an Invoke - this can cause unexpected deadlocks (as if any deadlock was expected :))

我建议检查代码是否存在死锁,尤其是在从后台线程进行UI更新时,像Control.Invoke调用这样的间接情况。确保在调用Invoke时没有持有锁 - 这可能导致意外的死锁(好像预计会出现任何死锁:))

#3

The issue DID in fact stem from the Garbage Collector. After many days of debugging and analyzing memory dumps with WinDbg, we realized that there was a deadlock situation, but induced by the GC collecting concurrently. Changing the GC to collect non-concurrently fixed our problem.

问题DID实际上源于垃圾收集器。在用WinDbg调试和分析内存转储多天后,我们意识到存在死锁情况,但同时由GC收集引起。更改GC以收集非同时修复我们的问题。

#4

Supporting Marek's answer, this seems much like a design problem with the model of concurrency you are using. Being a design problem, this is something you cannot effectively solve by testing.

支持Marek的答案,这看起来很像你正在使用的并发模型的设计问题。作为一个设计问题,这是你无法通过测试有效解决的问题。

My advice is to carefully consider the model of concurrency you are employing, and correct the design accordingly. Start by looking into the necessary conditions for a deadlock, e.g.:

我的建议是仔细考虑您使用的并发模型,并相应地更正设计。首先查看死锁的必要条件,例如:

What mutual exclusions do you have?

你有什么相互排斥?

Which additional resources your processes (which already are using some resources) require?

您的流程(已经使用某些资源)需要哪些额外资源?

Which resources need explicit releasing be the process using them?

哪些资源需要显式发布才是使用它们的过程?

Taking these into account, if you have circular resource allocation structures you're looking into a probable deadlock situation.

考虑到这些因素,如果您有循环资源分配结构,那么您正在研究可能出现的死锁情况。

#1