具有LongRunning状态和线程同步和性能的TPL

I have one doubt regarding the usage of TPL with the LongRunning state.

关于TPL与LongRunning状态的使用,我有一个疑问。

From MSDN The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processor cores that are available. Another benefit of the TPL is, that you don't have to deal with thread creation and synchronization.

来自MSDN TPL的目的是通过简化向应用程序添加并行性和并发性的过程来提高开发人员的工作效率。 TPL动态地扩展并发度,以最有效地使用所有可用的处理器核心。 TPL的另一个好处是,您不必处理线程创建和同步。

But if I set the LongRunning option, TPL assigns a dedicated thread from outside of thread pool. So in that case it will work some what similar to the traditional threading (What I believe, please correct if I am wrong). So in such a scenario, will TPL itself deal with thread creation and synchronization as mentioned above? Also will it automatically/internally scales the degree of concurrency dynamically to most efficiently use all the processor cores or developer need to write code to handle all those?

但是如果我设置了LongRunning选项,TPL会从线程池外部分配一个专用线程。因此,在这种情况下,它将工作一些类似于传统的线程(我相信,如果我错了,请纠正)。那么在这种情况下,TPL本身会处理如上所述的线程创建和同步吗?它还会自动/内部动态地扩展并发度,以最有效地使用所有处理器内核或开发人员需要编写代码来处理所有这些吗?

3 个解决方案

#1

There's always "traditional threading" somewhere down below.

在下面某处总是有“传统线程”。

Native threads are "heavy". If you'd execute one thread per task and then create really many tasks (thus threads), then you could starve/stall the process (or even whole machine or some systems). This makes it impossible/unfeasible to register many tiny operations and handle in that way and that obstacle impacts your code architecture.

本机线程“很重”。如果你为每个任务执行一个线程,然后创建很多任务(因此线程),那么你可能会饿死/停止进程(甚至整个机器或某些系统)。这使得注册许多微小的操作并以这种方式处理是不可能/不可行的,并且该障碍会影响您的代码体系结构。

This is where thread pool comes in. Changing the idea to not run one thread per job, but to have some threads pooled and let them work on a shared task queue, limits the amount of threads to exactly N of the pool, and you get the benefit of background message processing.

这就是线程池的用武之地。改变想法不是为每个作业运行一个线程,而是让一些线程池化并让它们在共享任务队列上工作,将线程数量限制为池中的N个,然后你得到背景消息处理的好处。

Sketching that idea, it's important to notice that the thread pool is (usually) limited to some N threads. This means that if you register many long running tasks to the thread pool, you may starve it. Threadpool works the best if it processes tiny quick jobs.

草拟这个想法,重要的是要注意线程池(通常)仅限于N个线程。这意味着如果您将许多长时间运行的任务注册到线程池,您可能会饿死它。如果处理微小的快速作业,Threadpool的效果最好。

This is why TPL allow you to specify which jobs are "long". They want to give you ability to relieve stress from threadpool. With Task-based ops, it's very important for the threadpool to keep running. Allow it to be starved, and all tasks will have to wait for some long op to complete. That's totally not what it was all about!

这就是TPL允许您指定哪些作业“长”的原因。他们想让你有能力减轻线程池的压力。使用基于任务的操作,线程池继续运行非常重要。允许它被饿死,所有任务都必须等待一些长时间的操作才能完成。这完全不是它的全部内容!

I'm sure that TPL handles the creation and management of that separate thread, dedicated to LongRunning jobs.

我确信TPL处理专用于LongRunning作业的单独线程的创建和管理。

As for the second question - actually I don't know. "Choosing the most efficient" is a hard task in general, so I'm pretty safe to say that "no, it does not do that in all cases" :))) I think the scaling of the DoP/DoC in TPL is as simple as adjusting the threadpool size to the number of logical processors on the machine. Separate threads for LongRunning jobs will still be created over the limit, so the ThreadPool is safe. Including them in DoP/DoC limit would starve the pool in just the same way, as it would decrease the number of available threads. I don't think TPL does much more in terms of scaling. Maaybe it schedules child tasks that operate on the same data on the same threads to get some caching or NUMA boost.. but I don't know, that's quite a far guess anyways.

至于第二个问题 - 实际上我不知道。 “选择最有效率”一般来说是一项艰巨的任务,所以我很安全地说“不,它并不是在所有情况下都这样做”:)))我认为TPL中DoP / DoC的扩展是就像将线程池大小调整为机器上的逻辑处理器数一样简单。 LongRunning作业的单独线程仍将超出限制,因此ThreadPool是安全的。将它们包含在DoP / DoC限制中会使池以同样的方式饿死,因为它会减少可用线程的数量。我不认为TPL在扩展方面做得更多。 Maaybe它会调度在同一个线程上运行相同数据的子任务,以获得一些缓存或NUMA提升......但我不知道,这是相当远的猜测。

I've just found an article you may find interesting: New and Improved CLR 4 Thread Pool Engine - I'm pretty sure that default TaskScheduler uses that pool. (from JohnSkeet: https://*.com/a/4534902/717732)

我刚刚发现了一篇你可能会感兴趣的文章:新的和改进的CLR 4线程池引擎 - 我很确定默认的TaskScheduler使用该池。 (来自JohnSkeet:https://*.com/a/4534902/717732)

And another example, someone performed a few tests regarding threadpool sizes and LongRunning flag: Threadpool thread starvation - a practical example

另一个例子,有人对线程池大小和LongRunning标志执行了一些测试:Threadpool线程饥饿 - 一个实际的例子

#2

So in such a scenario, will TPL itself deal with thread creation and synchronization as mentioned above?

那么在这种情况下,TPL本身会处理如上所述的线程创建和同步吗?

A LongRunning Task is nothing more that a Thread wrapped in a Task. This has the benefit that you can query its status, set continuations, wait for it and have errors propagated. You can also use combinators such as WaitAll.

LongRunning任务只不过是一个包含在任务中的线程。这样做的好处是,您可以查询其状态,设置延续,等待它并传播错误。您也可以使用WaitAll等组合器。

That's all there is to the LongRunning option.

这就是LongRunning选项的全部内容。

Also will it automatically/internally scales the degree of concurrency dynamically to most efficiently use all the processor cores or developer need to write code to handle all those?

它还会自动/内部动态地扩展并发度,以最有效地使用所有处理器内核或开发人员需要编写代码来处理所有这些吗?

How would you "scale" a single thread/task? It is inherently unscalable. You need multiple independent units of work (such as tasks or data items) to use multiple processors.

你将如何“扩展”单个线程/任务?它天生就是不可扩展的。您需要多个独立的工作单元(例如任务或数据项)才能使用多个处理器。

#3

The TPL scales the degree of concurrency dynamically to most efficiently use all the processor cores that are available. Another benefit of the TPL is, that you don't have to deal with thread creation and synchronization.

TPL动态地扩展并发度,以最有效地使用所有可用的处理器核心。 TPL的另一个好处是,您不必处理线程创建和同步。

This statement applies to Parallel.For family of TPL APIs. It doesn't apply to Task.Run or Task.Factory.StartNew, where you have explicit control over the degree of parallelism.

此语句适用于Parallel.For TPL API系列。它不适用于Task.Run或Task.Factory.StartNew,您可以在其中明确控制并行度。

For Task.Run (and Task.Factory.StartNew with default options, for that matter), there's no intelligent "scaling". It's just plain round-robin execution of work items, much like with ThreadPool.QueueUserWorkItem. This may actually engage all available pool threads (up to ThreadPool.GetMaxThreads), and then queue new tasks for deferred execution as busy pool thread becomes available. It also may be a subject to the thread pool stuttering issue.

对于Task.Run(以及具有默认选项的Task.Factory.StartNew),没有智能的“缩放”。它只是简单的循环执行工作项,就像使用ThreadPool.QueueUserWorkItem一样。这实际上可以使用所有可用的池线程(最多为ThreadPool.GetMaxThreads),然后在忙池线程可用时将新任务排队以延迟执行。它也可能是线程池口吃问题的主题。

Using Task.Factory.StartNew with LongRunning is only different in that you may escape the thread pool stuttering issue, but in the end you may simply exhaust the OS memory and other resources, as an OS thread is a very expensive resource.

使用具有LongRunning的Task.Factory.StartNew只是因为你可以逃避线程池口吃问题,但最终你可能只是耗尽操作系统内存和其他资源,因为OS线程是一个非常昂贵的资源。

In case with with Parallel.For etc., the TPL scheduler is much more intelligent. It doesn't waste threads on one-thread-per-work-item basis. Rather, it has a quite complicated imperative logic, taking into account the number of CPUs/cores and possibly some other runtime metrics.

在使用Parallel.For等的情况下,TPL调度程序更加智能。它不会在每个工作项的单线程上浪费线程。相反,它有一个非常复杂的命令逻辑,考虑到CPU /核心的数量以及可能的一些其他运行时指标。

Updated to address the comment, here's a simple example:

更新以解决评论,这是一个简单的例子:

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            int max = 50;
            int delay = 30; // ~30s per work item

            ThreadPool.SetMaxThreads(max, max);

            Console.WriteLine("starting, threads: {0}", Process.GetCurrentProcess().Threads.Count);

            var tasks = Enumerable.Range(0, max).Select(n => Task.Factory.StartNew(() =>
            {
                Console.WriteLine("task: {0}, threads: {1}, pool thread: {2}", 
                    n, Process.GetCurrentProcess().Threads.Count, Thread.CurrentThread.IsThreadPoolThread);

                for (int i = 0; i < delay * 1000; i++)
                {
                    Thread.Sleep(1);
                }
            })).ToArray();

            Console.WriteLine("waiting, threads: {0}", Process.GetCurrentProcess().Threads.Count);
            Task.WaitAll(tasks);

            Console.WriteLine("done, threads: {0}", Process.GetCurrentProcess().Threads.Count);
            Console.ReadLine();
        }
    }
}

The output (Release build, no debugger attached, .NET 4.5, 4 core CPU):

输出(发布版本,未附带调试器,.NET 4.5,4核CPU):

starting, threads: 3
task: 0, threads: 11, pool thread: True
task: 2, threads: 11, pool thread: True
waiting, threads: 11
task: 1, threads: 11, pool thread: True
task: 3, threads: 11, pool thread: True
...
task: 48, threads: 56, pool thread: True
task: 49, threads: 57, pool thread: True
done, threads: 47

It confirms both the growing and the stuttering behavior of the ThreadPool, up to the max number of threads. New threads are created with ~500ms delays.

它确认了ThreadPool的增长和口吃行为,直到最大线程数。新线程创建时延迟约500毫秒。

Now, if we add TaskCreationOptions.LongRunning to Task.Factory.StartNew, we eliminated the stuttering, and we're no longer limited by the ThreadPool size, but we'll still end up engaging up to the max number of new threads, one per task (depending on how much each work item takes to execute).

现在,如果我们将TaskCreationOptions.LongRunning添加到Task.Factory.StartNew,我们就消除了口吃,我们不再受ThreadPool大小的限制,但我们最终还是会接受新线程的最大数量,一个每个任务(取决于每个工作项执行的时间)。

Also will it automatically/internally scales the degree of concurrency dynamically to most efficiently use all the processor cores or developer need to write code to handle all those?

它还会自动/内部动态地扩展并发度,以最有效地使用所有处理器内核或开发人员需要编写代码来处理所有这些吗?

Thus, if the developer wants to use TPL's Task.Run or Task.Factory.StartNew APIs, he or she does need to handle the level of parallelism manually. That's not to difficult though, e.g., with SemaphoreSlim.

因此,如果开发人员想要使用TPL的Task.Run或Task.Factory.StartNew API,他或她确实需要手动处理并行级别。但这并不困难,例如SemaphoreSlim。

#1

There's always "traditional threading" somewhere down below.

在下面某处总是有“传统线程”。

I'm sure that TPL handles the creation and management of that separate thread, dedicated to LongRunning jobs.

我确信TPL处理专用于LongRunning作业的单独线程的创建和管理。

我刚刚发现了一篇你可能会感兴趣的文章:新的和改进的CLR 4线程池引擎 - 我很确定默认的TaskScheduler使用该池。 (来自JohnSkeet:https://*.com/a/4534902/717732)

And another example, someone performed a few tests regarding threadpool sizes and LongRunning flag: Threadpool thread starvation - a practical example

另一个例子,有人对线程池大小和LongRunning标志执行了一些测试:Threadpool线程饥饿 - 一个实际的例子

#2

So in such a scenario, will TPL itself deal with thread creation and synchronization as mentioned above?

那么在这种情况下,TPL本身会处理如上所述的线程创建和同步吗?

LongRunning任务只不过是一个包含在任务中的线程。这样做的好处是,您可以查询其状态,设置延续,等待它并传播错误。您也可以使用WaitAll等组合器。

That's all there is to the LongRunning option.

这就是LongRunning选项的全部内容。

Also will it automatically/internally scales the degree of concurrency dynamically to most efficiently use all the processor cores or developer need to write code to handle all those?

它还会自动/内部动态地扩展并发度,以最有效地使用所有处理器内核或开发人员需要编写代码来处理所有这些吗?

How would you "scale" a single thread/task? It is inherently unscalable. You need multiple independent units of work (such as tasks or data items) to use multiple processors.

你将如何“扩展”单个线程/任务?它天生就是不可扩展的。您需要多个独立的工作单元(例如任务或数据项)才能使用多个处理器。

#3

The TPL scales the degree of concurrency dynamically to most efficiently use all the processor cores that are available. Another benefit of the TPL is, that you don't have to deal with thread creation and synchronization.

TPL动态地扩展并发度,以最有效地使用所有可用的处理器核心。 TPL的另一个好处是,您不必处理线程创建和同步。

This statement applies to Parallel.For family of TPL APIs. It doesn't apply to Task.Run or Task.Factory.StartNew, where you have explicit control over the degree of parallelism.

此语句适用于Parallel.For TPL API系列。它不适用于Task.Run或Task.Factory.StartNew,您可以在其中明确控制并行度。

Updated to address the comment, here's a simple example:

更新以解决评论,这是一个简单的例子:

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            int max = 50;
            int delay = 30; // ~30s per work item

            ThreadPool.SetMaxThreads(max, max);

            Console.WriteLine("starting, threads: {0}", Process.GetCurrentProcess().Threads.Count);

            var tasks = Enumerable.Range(0, max).Select(n => Task.Factory.StartNew(() =>
            {
                Console.WriteLine("task: {0}, threads: {1}, pool thread: {2}", 
                    n, Process.GetCurrentProcess().Threads.Count, Thread.CurrentThread.IsThreadPoolThread);

                for (int i = 0; i < delay * 1000; i++)
                {
                    Thread.Sleep(1);
                }
            })).ToArray();

            Console.WriteLine("waiting, threads: {0}", Process.GetCurrentProcess().Threads.Count);
            Task.WaitAll(tasks);

            Console.WriteLine("done, threads: {0}", Process.GetCurrentProcess().Threads.Count);
            Console.ReadLine();
        }
    }
}

The output (Release build, no debugger attached, .NET 4.5, 4 core CPU):

输出(发布版本,未附带调试器,.NET 4.5,4核CPU):

starting, threads: 3
task: 0, threads: 11, pool thread: True
task: 2, threads: 11, pool thread: True
waiting, threads: 11
task: 1, threads: 11, pool thread: True
task: 3, threads: 11, pool thread: True
...
task: 48, threads: 56, pool thread: True
task: 49, threads: 57, pool thread: True
done, threads: 47

It confirms both the growing and the stuttering behavior of the ThreadPool, up to the max number of threads. New threads are created with ~500ms delays.

它确认了ThreadPool的增长和口吃行为,直到最大线程数。新线程创建时延迟约500毫秒。

Also will it automatically/internally scales the degree of concurrency dynamically to most efficiently use all the processor cores or developer need to write code to handle all those?

它还会自动/内部动态地扩展并发度,以最有效地使用所有处理器内核或开发人员需要编写代码来处理所有这些吗?

因此,如果开发人员想要使用TPL的Task.Run或Task.Factory.StartNew API,他或她确实需要手动处理并行级别。但这并不困难,例如SemaphoreSlim。

秒客网

具有LongRunning状态和线程同步和性能的TPL

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章