在.Net中实现并行任务队列

时间:2021-11-18 13:48:43

An image speaks more than words, so here is basically what I want to achieve :
(I have also used a fruit analogy for the sake of genericity an simplicity)
在.Net中实现并行任务队列

图像说的不仅仅是文字,所以这里基本上就是我想要实现的目标:(为了简单起见,我还使用了水果类比)

I've done this kind of stuff many time in the past using different king of .Net classes (BackGroundWOrkers, ThreadPool, Self Made Stuff...)

我过去很多次使用不同的.Net类之王(BackGroundWOrkers,ThreadPool,Self Made Stuff ......)做过这种事情。

I am asking here for the sake of advice and to get fresh ideas on how to do this efficiently.
This is a high computing program so I am receiving Millions of (similar in structure but not in content) data, that have to be queued in order to be processed according to its content type. Hence, I want to avoid creating a parallel task for each single data to be processed (this overloads the CPU and is poor design IMHO). That's why I got the idea of having only ONE thread running for EACH data TYPE, dedicated to processing it (knowing that the "Press Juice" method is generic and independent of the fruit to be pressed)

我在这里要求提供建议并获得有关如何有效地做到这一点的新想法。这是一个高计算程序,因此我收到数百万(结构相似但内容不相似)数据,必须排队才能根据其内容类型进行处理。因此,我想避免为每个要处理的单个数据创建并行任务(这会使CPU过载并且设计不佳恕我直言)。这就是为什么我想到只有一个线程为EACH数据TYPE运行,专门用于处理它(知道“Press Juice”方法是通用的并且独立于要按下的水果)

Any Ideas and implementation suggestions are welcome.
I am free to give any further details.

欢迎任何想法和实施建议。我可以*地提供任何进一步的细节。

3 个解决方案

#1


19  

TPL DataFlow seems like a very strong candidate for this.

TPL DataFlow似乎是一个非常强大的候选人。

Take a read of the intro here.

在这里阅读介绍。

#2


7  

If all you really want is one thread (or a constant number of threads) for each type of fruit, then the simplest solution might be to use a BlockingCollection for each type of fruit. Your data bus will deliver the fruit to those collections, and your processing threads will take from them. But this means if there are no apples for now, the thread will be blocked, doing nothing.

如果你真正想要的是每种类型的水果的一个线程(或一个恒定数量的线程),那么最简单的解决方案可能是对每种类型的水果使用BlockingCollection。您的数据总线将为这些集合提供水果,您的处理线程将从中获取。但这意味着如果现在没有苹果,线程将被阻止,什么都不做。

A more flexible and efficient approach would be to use TPL Dataflow. With that, you don't work with threads or tasks, you work with blocks. For example your Thread C could be represented as a TransformBlock<Apple, AppleJuice>.

更灵活和有效的方法是使用TPL Dataflow。有了它,你不使用线程或任务,你使用块。例如,您的Thread C可以表示为TransformBlock ,applejuice>

By default, each block uses at most one thread, but they can be easily configured to use more of them (by setting MaxDegreeOfParallelism). Also, dataflow blocks work well with the new C# 5.0 async-await, which could be a big advantage.

默认情况下,每个块最多使用一个线程,但可以轻松配置它们以使用更多线程(通过设置MaxDegreeOfParallelism)。此外,数据流块与新的C#5.0 async-await配合良好,这可能是一个很大的优势。

There are also things you should be careful about. For example, TDF is by default optimized for throughput, not latency. So, if your thread pool is busy and you have lots of oranges incoming and only one apple, it's possible that the apple will be processed only after all of the oranges are. But this can be also fixed by configuring the blocks properly (by setting MaxMessagesPerTask).

还有一些你应该小心的事情。例如,默认情况下,TDF针对吞吐量而非延迟进行了优化。因此,如果你的线程池很忙并且你有很多橙子进来而且只有一个苹果,那么苹果只有在所有的橙子都被处理后才有可能被处理掉。但这也可以通过正确配置块(通过设置MaxMessagesPerTask)来解决。

#3


1  

I would caution against a "worker thread per data type" approach. This makes the assumption that actual input load will conform to the equivalence classes that are handy for developers. Do you know if bananas are 5x slower to juice than oranges? What happens if every Tuesday is "apple celebration day" and everyone juices more fruit than usual, and all of it is apples?

我会警告“每种数据类型的工作线程”方法。这假设实际输入负载将符合开发人员方便的等价类。你知道香蕉比橘子慢5倍吗?如果每个星期二都是“苹果庆祝日”会发生什么,每个人都会比平时吃更多水果,所有这些都是苹果?

Running things in parallel is about performance, not about the domain. Don't model it after your domain, model it to provide the lowest average cycle time.

并行运行是关于性能,而不是关于域。不要在域之后对其进行建模,对其进行建模以提供最低的平均周期时间。

#1


19  

TPL DataFlow seems like a very strong candidate for this.

TPL DataFlow似乎是一个非常强大的候选人。

Take a read of the intro here.

在这里阅读介绍。

#2


7  

If all you really want is one thread (or a constant number of threads) for each type of fruit, then the simplest solution might be to use a BlockingCollection for each type of fruit. Your data bus will deliver the fruit to those collections, and your processing threads will take from them. But this means if there are no apples for now, the thread will be blocked, doing nothing.

如果你真正想要的是每种类型的水果的一个线程(或一个恒定数量的线程),那么最简单的解决方案可能是对每种类型的水果使用BlockingCollection。您的数据总线将为这些集合提供水果,您的处理线程将从中获取。但这意味着如果现在没有苹果,线程将被阻止,什么都不做。

A more flexible and efficient approach would be to use TPL Dataflow. With that, you don't work with threads or tasks, you work with blocks. For example your Thread C could be represented as a TransformBlock<Apple, AppleJuice>.

更灵活和有效的方法是使用TPL Dataflow。有了它,你不使用线程或任务,你使用块。例如,您的Thread C可以表示为TransformBlock ,applejuice>

By default, each block uses at most one thread, but they can be easily configured to use more of them (by setting MaxDegreeOfParallelism). Also, dataflow blocks work well with the new C# 5.0 async-await, which could be a big advantage.

默认情况下,每个块最多使用一个线程,但可以轻松配置它们以使用更多线程(通过设置MaxDegreeOfParallelism)。此外,数据流块与新的C#5.0 async-await配合良好,这可能是一个很大的优势。

There are also things you should be careful about. For example, TDF is by default optimized for throughput, not latency. So, if your thread pool is busy and you have lots of oranges incoming and only one apple, it's possible that the apple will be processed only after all of the oranges are. But this can be also fixed by configuring the blocks properly (by setting MaxMessagesPerTask).

还有一些你应该小心的事情。例如,默认情况下,TDF针对吞吐量而非延迟进行了优化。因此,如果你的线程池很忙并且你有很多橙子进来而且只有一个苹果,那么苹果只有在所有的橙子都被处理后才有可能被处理掉。但这也可以通过正确配置块(通过设置MaxMessagesPerTask)来解决。

#3


1  

I would caution against a "worker thread per data type" approach. This makes the assumption that actual input load will conform to the equivalence classes that are handy for developers. Do you know if bananas are 5x slower to juice than oranges? What happens if every Tuesday is "apple celebration day" and everyone juices more fruit than usual, and all of it is apples?

我会警告“每种数据类型的工作线程”方法。这假设实际输入负载将符合开发人员方便的等价类。你知道香蕉比橘子慢5倍吗?如果每个星期二都是“苹果庆祝日”会发生什么,每个人都会比平时吃更多水果,所有这些都是苹果?

Running things in parallel is about performance, not about the domain. Don't model it after your domain, model it to provide the lowest average cycle time.

并行运行是关于性能,而不是关于域。不要在域之后对其进行建模,对其进行建模以提供最低的平均周期时间。