为什么c#中的多线程不能达到100% CPU?

时间:2022-02-16 00:27:07

I'm working on a program that processes many requests, none of them reaching more than 50% of CPU (currently I'm working on a dual core). So I created a thread for each request, the whole process is faster. Processing 9 requests, a single thread lasts 02min08s, while with 3 threads working simultaneously the time decreased to 01min37s, but it keeps not using 100% CPU, only around 50%.

我正在开发一个程序,该程序可以处理许多请求,其中没有一个请求的CPU占用超过50%(目前我正在开发一个双内核)。所以我为每个请求创建了一个线程,整个过程更快。处理9个请求,一个线程持续02min08s,同时有3个线程同时工作,时间减少到01min37s,但是它不使用100% CPU,只有大约50%。

How could I allow my program to use full processors capability?

我如何允许我的程序使用完整的处理器能力?

EDIT The application isn't IO or Memory bounded, they're at reasonable levels all the time.

编辑应用程序不受IO或内存限制,它们始终处于合理的级别。

I think it has something to do with the 'dual core' thing.

我认为这与“双重核心”有关。

There is a locked method invocation that every request uses, but it is really fast, I don't think this is the problem.

每个请求都使用一个锁定的方法调用,但是它非常快,我不认为这是问题所在。

The more cpu-costly part of my code is the call of a dll via COM (the same external method is called from all threads). This dll is also no Memory or IO-bounded, it is an AI recognition component, I'm doing an OCR recognition of paychecks, a paycheck for request.

我的代码中cpu开销更大的部分是通过COM调用dll(所有线程都调用相同的外部方法)。这个dll也没有内存或io限制,它是一个人工智能识别组件,我正在做一个OCR识别的工资单,一个请求的工资单。

EDIT2

EDIT2

It is very probable that the STA COM Method is my problem, I contacted the component owners in order to solve this problem.

很有可能STA COM方法是我的问题,为了解决这个问题,我联系了组件的所有者。

13 个解决方案

#1


24  

Do you have significant locking within your application? If the threads are waiting for each other a lot, that could easily explain it.

您的应用程序中是否有重要的锁定?如果线程之间等待的时间很长,这很容易解释。

Other than that (and the other answers given), it's very hard to guess, really. A profiler is your friend...

除此之外(还有其他的答案),真的很难去猜测。剖析者是你的朋友……

EDIT: Okay, given the comments below, I think we're onto something:

编辑:好的,鉴于下面的评论,我认为我们有了一些发现:

The more cpu-costly part of my code is the call of a dll via COM (the same external method is called from all threads).

我的代码中cpu开销更大的部分是通过COM调用dll(所有线程都调用相同的外部方法)。

Is the COM method running in an STA by any chance? If so, it'll only use one thread, serializing calls. I strongly suspect that's the key to it. It's similar to having a lock around that method call (not quite the same, admittedly).

COM方法是否在STA中运行?如果是这样,它将只使用一个线程,即序列化调用。我强烈怀疑这是关键。它类似于在方法调用周围有一个锁(必须承认)。

#2


17  

The problem is the COM object.

问题是COM对象。

Most COM objects run in the context of a 'single-threaded apartment'. (You may have seen a [STAThread] annotation on the main method of a .NET application from time to time?)

大多数COM对象在“单线程公寓”上下文中运行。(您可能经常在.NET应用程序的主方法上看到[STAThread]注释?)

Effectively this means that all dispatches to that object are handled by a single thread. Throwing more cores at the problem just gives you more resources that can sit around and wait or do other things in .NET.

实际上,这意味着对该对象的所有分派都由一个线程处理。在这个问题上投入更多的内核只会给你更多的资源,让你可以在。net中坐等或做其他事情。

You might want to take a look at this article from Joe Duffy (the head parallel .NET guy at Microsoft) on the topic.

你可能想看看Joe Duffy(微软的头并行。net guy)的这篇文章。

http://www.bluebytesoftware.com/blog/PermaLink,guid,8c2fed10-75b2-416b-aabc-c18ce8fe2ed4.aspx

http://www.bluebytesoftware.com/blog/PermaLink guid 8 c2fed10 - 75 b2 - 416 b - aabc c18ce8fe2ed4.aspx

In practice if you have to do a bunch of things against a single COM object like this you are hosed, because .NET will just serialize access patterns internally behind your back. If you can create multiple COM objects and use them then you can resolve the issue because each can be created and accessed from a distinct STA thread. This will work until you hit about 100 STA threads, then things will go wonky. For details, see the article.

在实践中,如果你需要对一个COM对象做很多事情,你就需要这样做,因为。net会在你背后序列化访问模式。如果您可以创建多个COM对象并使用它们,那么您就可以解决这个问题,因为每个对象都可以从一个不同的STA线程创建和访问。这将工作到您点击大约100个STA线程,然后事情将会变得不稳定。有关详细信息,请参见本文。

#3


13  

It is probably no longer the processor that is the bottleneck for completing your process. The bottleneck has likely moved to disk access, network access or memory access. You could also have a situation where your threads are competing for locks.

可能处理器不再是完成流程的瓶颈。瓶颈可能已经转移到磁盘访问、网络访问或内存访问。您还可能遇到这样的情况:您的线程正在争夺锁。

Only you know exactly what your threads are doing, so you need to look at them with the above in mind.

只有您知道您的线程正在做什么,所以您需要记住上面的内容来查看它们。

#4


4  

It depends what your program does - the work carried out by your concurrent Requests could be IO-bound - limited by the speed of (eg) your hard disk - rather than CPU bound, when you would see your CPU hit 100%.

这取决于您的程序所做的工作——您的并发请求所执行的工作可能是由您的硬盘(如硬盘)所限制的,而不是CPU限制,当您看到您的CPU达到100%时。

After the edit, it does sound like COM STA objects might be the culprit.

在编辑之后,它听起来确实像是COM STA对象可能是罪魁祸首。

Do all threads call the same instance of the COM object? Would it be possible to make your worker thread STA threads, and create a separate instance of the COM object on each thread. In this way it might be possible to avoid the STA bottleneck.

所有线程都调用同一个COM对象实例吗?是否可能让您的worker线程STA线程,并在每个线程上创建COM对象的单独实例。通过这种方式,可能可以避免STA瓶颈。

To tell if a COM coclass is STA:

判断COM类是否为STA:

class Test
{
  static void Main() //This will be an MTA thread by default
  {
    var o = new COMObjectClass();
    // Did a new thread pop into existence when that line was executed?
    // If so, .NET created an STA thread for it to live in.
  }
}

#5


2  

I think I had a similar problem. I was creating multiple threads in c# that ran c++ code through a COM interface. My dual core CPU never reached 100%.

我想我也有类似的问题。我在c#中创建了多个线程,通过COM接口运行c++代码。我的双核CPU从来没有达到100%。

After reading this post, I almost gave up. Then I tried calling SetApartmentState(ApartmentState.STA) on my Threads.

看完这篇文章,我差点就放弃了。然后我尝试在线程上调用SetApartmentState(ApartmentState.STA)。

After only changing this, the CPU maxed out.

只改变了这一点,CPU就耗尽了。

#6


0  

It sounds like your application's performance may not be 'bound' by the amount of cpu resources available. If you're processing requests over the network, the cpu(s) may be waiting for the data to arrive, or for the network device to transfer the data. Alternatively, if you need to look up data to fulfill the request, the cpu may be waiting for the disk.

听起来您的应用程序的性能可能不受可用cpu资源数量的限制。如果您正在通过网络处理请求,cpu可能正在等待数据到达,或者等待网络设备传输数据。另外,如果您需要查找数据来完成请求,cpu可能正在等待磁盘。

#7


0  

Are you sure that your tasks require intensive processor activity? Is there any IO processing? This can be the reason for your 50% load.

您确定您的任务需要密集的处理器活动吗?是否有IO处理?这可能是你50%的负担的原因。

Test: Try using only 2 threads and set he affinity of each thread for each Core. Then open task manager and watch the load of both cores.

测试:尝试只使用2个线程,并为每个内核设置每个线程的关联性。然后打开任务管理器,查看两个核心的负载。

#8


0  

This isn't an answer really, but have you checked perfmon to see what resources it is using and have you run profilers on the code to see where it is spending time?

这并不是一个真正的答案,但是您是否检查了perfmon以查看它使用的资源,并在代码上运行分析器以查看它在哪里花费时间?

How have you determined that IO or other non CPU resources are not the bottleneck?

您如何确定IO或其他非CPU资源不是瓶颈?

Can you give a brief description of what the threads are doing?

你能简单描述一下这些线程在做什么吗?

#9


0  

if your process is running on cpu 0 and spawning threads there, the maximum it will ever reach is 50%. See if you have threads running on both cores or on just one. I would venture to guess you're isolated to a single core, or that one of your dependent resources is locked on a single core. If it hits exactly 50% then a single core is very likely to be your bottleneck.

如果您的进程正在cpu 0上运行并在那里生成线程,那么它将达到50%的最大值。看看是否有线程在两个核上运行,还是只在一个核上运行。我可以大胆地猜测您被隔离到一个核心,或者您的一个依赖资源被锁定在一个核心上。如果刚好达到50%,那么单个核很可能成为你的瓶颈。

#10


0  

So you solved the problem of using a single COM object and now have an IO problem.

因此,您解决了使用单个COM对象的问题,现在有一个IO问题。

The increased run time for multiple threads is probably because of mixing random IO together, which will slow it all down.

对于多个线程来说,增加的运行时间可能是由于将随机IO混合在一起,这将降低它的运行速度。

If the data set will fit into RAM, try to see if you can prefetch it into cache. Perhaps just reading the data, or maybe memory mapping it together with a command to make it available.

如果数据集适合于RAM,请尝试查看是否可以将其预取到缓存中。也许仅仅是读取数据,或者可能是内存与命令一起映射以使数据可用。

This is why SQL databases will often choose sequential table scan over an index scan on queries you wouldn't expect: it can be much faster to read all of it in order than to read it in random chunks.

这就是为什么SQL数据库通常会选择顺序表扫描,而不是对查询进行索引扫描。

#11


0  

Maybe I'm misunderstanding something, but you said none of your requests (each in a separate thread) reaches 100% CPU.

也许我误解了什么,但是您说您的请求(每个请求都在一个单独的线程中)没有一个达到100% CPU。

What operating system are you using?

您正在使用什么操作系统?

I seem to vaguely recall that in old versions of windows (e.g., early XPs and 2000s), CPU utilization was considered from total of two processors, so a single thread wasn't able to make it past 50% unless it was the idle process..

我似乎模糊地记得,在旧版本的windows(例如早期的XPs和21世纪初)中,CPU的利用率是从两个处理器中考虑的,所以一个线程不能超过50%,除非它是空闲进程。

#12


0  

One more note, have you tried launching your code not from Visual Studio (regardless of release / debug settings) ?

还有一个问题,您是否尝试过启动您的代码,而不是从Visual Studio(不管发布/调试设置)?

#13


0  

The problem is the COM object. It is STA, and I can't either have two instances running concurrently on the same process. When I create an instance for the COM class, the other becomes unusable.

问题是COM对象。它是STA,我不能让两个实例同时运行在同一个进程上。当我为COM类创建实例时,另一个就变得不可用了。

I've contacted the component developers, they're thinking what they can do for me.

我已经联系了组件开发人员,他们正在考虑他们能为我做些什么。

Thanks you all ;)

谢谢你,)

#1


24  

Do you have significant locking within your application? If the threads are waiting for each other a lot, that could easily explain it.

您的应用程序中是否有重要的锁定?如果线程之间等待的时间很长,这很容易解释。

Other than that (and the other answers given), it's very hard to guess, really. A profiler is your friend...

除此之外(还有其他的答案),真的很难去猜测。剖析者是你的朋友……

EDIT: Okay, given the comments below, I think we're onto something:

编辑:好的,鉴于下面的评论,我认为我们有了一些发现:

The more cpu-costly part of my code is the call of a dll via COM (the same external method is called from all threads).

我的代码中cpu开销更大的部分是通过COM调用dll(所有线程都调用相同的外部方法)。

Is the COM method running in an STA by any chance? If so, it'll only use one thread, serializing calls. I strongly suspect that's the key to it. It's similar to having a lock around that method call (not quite the same, admittedly).

COM方法是否在STA中运行?如果是这样,它将只使用一个线程,即序列化调用。我强烈怀疑这是关键。它类似于在方法调用周围有一个锁(必须承认)。

#2


17  

The problem is the COM object.

问题是COM对象。

Most COM objects run in the context of a 'single-threaded apartment'. (You may have seen a [STAThread] annotation on the main method of a .NET application from time to time?)

大多数COM对象在“单线程公寓”上下文中运行。(您可能经常在.NET应用程序的主方法上看到[STAThread]注释?)

Effectively this means that all dispatches to that object are handled by a single thread. Throwing more cores at the problem just gives you more resources that can sit around and wait or do other things in .NET.

实际上,这意味着对该对象的所有分派都由一个线程处理。在这个问题上投入更多的内核只会给你更多的资源,让你可以在。net中坐等或做其他事情。

You might want to take a look at this article from Joe Duffy (the head parallel .NET guy at Microsoft) on the topic.

你可能想看看Joe Duffy(微软的头并行。net guy)的这篇文章。

http://www.bluebytesoftware.com/blog/PermaLink,guid,8c2fed10-75b2-416b-aabc-c18ce8fe2ed4.aspx

http://www.bluebytesoftware.com/blog/PermaLink guid 8 c2fed10 - 75 b2 - 416 b - aabc c18ce8fe2ed4.aspx

In practice if you have to do a bunch of things against a single COM object like this you are hosed, because .NET will just serialize access patterns internally behind your back. If you can create multiple COM objects and use them then you can resolve the issue because each can be created and accessed from a distinct STA thread. This will work until you hit about 100 STA threads, then things will go wonky. For details, see the article.

在实践中,如果你需要对一个COM对象做很多事情,你就需要这样做,因为。net会在你背后序列化访问模式。如果您可以创建多个COM对象并使用它们,那么您就可以解决这个问题,因为每个对象都可以从一个不同的STA线程创建和访问。这将工作到您点击大约100个STA线程,然后事情将会变得不稳定。有关详细信息,请参见本文。

#3


13  

It is probably no longer the processor that is the bottleneck for completing your process. The bottleneck has likely moved to disk access, network access or memory access. You could also have a situation where your threads are competing for locks.

可能处理器不再是完成流程的瓶颈。瓶颈可能已经转移到磁盘访问、网络访问或内存访问。您还可能遇到这样的情况:您的线程正在争夺锁。

Only you know exactly what your threads are doing, so you need to look at them with the above in mind.

只有您知道您的线程正在做什么,所以您需要记住上面的内容来查看它们。

#4


4  

It depends what your program does - the work carried out by your concurrent Requests could be IO-bound - limited by the speed of (eg) your hard disk - rather than CPU bound, when you would see your CPU hit 100%.

这取决于您的程序所做的工作——您的并发请求所执行的工作可能是由您的硬盘(如硬盘)所限制的,而不是CPU限制,当您看到您的CPU达到100%时。

After the edit, it does sound like COM STA objects might be the culprit.

在编辑之后,它听起来确实像是COM STA对象可能是罪魁祸首。

Do all threads call the same instance of the COM object? Would it be possible to make your worker thread STA threads, and create a separate instance of the COM object on each thread. In this way it might be possible to avoid the STA bottleneck.

所有线程都调用同一个COM对象实例吗?是否可能让您的worker线程STA线程,并在每个线程上创建COM对象的单独实例。通过这种方式,可能可以避免STA瓶颈。

To tell if a COM coclass is STA:

判断COM类是否为STA:

class Test
{
  static void Main() //This will be an MTA thread by default
  {
    var o = new COMObjectClass();
    // Did a new thread pop into existence when that line was executed?
    // If so, .NET created an STA thread for it to live in.
  }
}

#5


2  

I think I had a similar problem. I was creating multiple threads in c# that ran c++ code through a COM interface. My dual core CPU never reached 100%.

我想我也有类似的问题。我在c#中创建了多个线程,通过COM接口运行c++代码。我的双核CPU从来没有达到100%。

After reading this post, I almost gave up. Then I tried calling SetApartmentState(ApartmentState.STA) on my Threads.

看完这篇文章,我差点就放弃了。然后我尝试在线程上调用SetApartmentState(ApartmentState.STA)。

After only changing this, the CPU maxed out.

只改变了这一点,CPU就耗尽了。

#6


0  

It sounds like your application's performance may not be 'bound' by the amount of cpu resources available. If you're processing requests over the network, the cpu(s) may be waiting for the data to arrive, or for the network device to transfer the data. Alternatively, if you need to look up data to fulfill the request, the cpu may be waiting for the disk.

听起来您的应用程序的性能可能不受可用cpu资源数量的限制。如果您正在通过网络处理请求,cpu可能正在等待数据到达,或者等待网络设备传输数据。另外,如果您需要查找数据来完成请求,cpu可能正在等待磁盘。

#7


0  

Are you sure that your tasks require intensive processor activity? Is there any IO processing? This can be the reason for your 50% load.

您确定您的任务需要密集的处理器活动吗?是否有IO处理?这可能是你50%的负担的原因。

Test: Try using only 2 threads and set he affinity of each thread for each Core. Then open task manager and watch the load of both cores.

测试:尝试只使用2个线程,并为每个内核设置每个线程的关联性。然后打开任务管理器,查看两个核心的负载。

#8


0  

This isn't an answer really, but have you checked perfmon to see what resources it is using and have you run profilers on the code to see where it is spending time?

这并不是一个真正的答案,但是您是否检查了perfmon以查看它使用的资源,并在代码上运行分析器以查看它在哪里花费时间?

How have you determined that IO or other non CPU resources are not the bottleneck?

您如何确定IO或其他非CPU资源不是瓶颈?

Can you give a brief description of what the threads are doing?

你能简单描述一下这些线程在做什么吗?

#9


0  

if your process is running on cpu 0 and spawning threads there, the maximum it will ever reach is 50%. See if you have threads running on both cores or on just one. I would venture to guess you're isolated to a single core, or that one of your dependent resources is locked on a single core. If it hits exactly 50% then a single core is very likely to be your bottleneck.

如果您的进程正在cpu 0上运行并在那里生成线程,那么它将达到50%的最大值。看看是否有线程在两个核上运行,还是只在一个核上运行。我可以大胆地猜测您被隔离到一个核心,或者您的一个依赖资源被锁定在一个核心上。如果刚好达到50%,那么单个核很可能成为你的瓶颈。

#10


0  

So you solved the problem of using a single COM object and now have an IO problem.

因此,您解决了使用单个COM对象的问题,现在有一个IO问题。

The increased run time for multiple threads is probably because of mixing random IO together, which will slow it all down.

对于多个线程来说,增加的运行时间可能是由于将随机IO混合在一起,这将降低它的运行速度。

If the data set will fit into RAM, try to see if you can prefetch it into cache. Perhaps just reading the data, or maybe memory mapping it together with a command to make it available.

如果数据集适合于RAM,请尝试查看是否可以将其预取到缓存中。也许仅仅是读取数据,或者可能是内存与命令一起映射以使数据可用。

This is why SQL databases will often choose sequential table scan over an index scan on queries you wouldn't expect: it can be much faster to read all of it in order than to read it in random chunks.

这就是为什么SQL数据库通常会选择顺序表扫描,而不是对查询进行索引扫描。

#11


0  

Maybe I'm misunderstanding something, but you said none of your requests (each in a separate thread) reaches 100% CPU.

也许我误解了什么,但是您说您的请求(每个请求都在一个单独的线程中)没有一个达到100% CPU。

What operating system are you using?

您正在使用什么操作系统?

I seem to vaguely recall that in old versions of windows (e.g., early XPs and 2000s), CPU utilization was considered from total of two processors, so a single thread wasn't able to make it past 50% unless it was the idle process..

我似乎模糊地记得,在旧版本的windows(例如早期的XPs和21世纪初)中,CPU的利用率是从两个处理器中考虑的,所以一个线程不能超过50%,除非它是空闲进程。

#12


0  

One more note, have you tried launching your code not from Visual Studio (regardless of release / debug settings) ?

还有一个问题,您是否尝试过启动您的代码,而不是从Visual Studio(不管发布/调试设置)?

#13


0  

The problem is the COM object. It is STA, and I can't either have two instances running concurrently on the same process. When I create an instance for the COM class, the other becomes unusable.

问题是COM对象。它是STA,我不能让两个实例同时运行在同一个进程上。当我为COM类创建实例时,另一个就变得不可用了。

I've contacted the component developers, they're thinking what they can do for me.

我已经联系了组件开发人员,他们正在考虑他们能为我做些什么。

Thanks you all ;)

谢谢你,)