使用多线程处理大量数据。

时间:2021-09-10 15:49:15

I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database. Processing each record is also a fairly complex operation. I need to perform a lot of inserts an updates as part of the processing.

我需要编写一个c#服务(可以是windows服务或控制台应用程序),它需要处理存储在数据库中的大量数据(100,000条记录)。处理每个记录也是一个相当复杂的操作。作为处理的一部分,我需要执行许多插入更新。

We are using NHibernate as the ORM.

我们使用NHibernate作为ORM。

One way is to load all the records and process them sequentially... which could turn out to be quite slow. I was looking at multi threading options and was thinking of having multiples threads processing chunks of records simultaneously .

一种方法是加载所有记录并按顺序处理它们……结果可能会相当慢。我正在研究多线程选项,并考虑让多个线程同时处理记录块。

Could anyone give me some pointers on how I should approach this.. considering that I'm using NHibernate and what are the possible gotchas like deadlock etc

谁能指点我该怎么做?考虑到我正在使用NHibernate,以及可能的陷阱,比如死锁等等

Thanks a lot.

非常感谢。

4 个解决方案

#1


2  

You should consider Task Parallel Library.

您应该考虑任务并行库。

#2


2  

Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:

假设您正在使用。net 4.0,您可以使用任务并行库(如前所述)来完成以下操作:

Parallel.ForEach(sourceCollection, item => Process(item));

Your source collection would be an IEnumerable of the loaded records. The library will handle everything for you:

您的源集合将是已加载记录的IEnumerable。图书馆将为您处理一切:

The source collection is partitioned and the work is scheduled on multiple threads based on the system environment. The more processors on the system, the faster the parallel method runs.

源集合被分区,工作基于系统环境调度在多个线程上。系统上的处理器越多,并行方法运行的速度就越快。

It may help to read a tutorial on using Parallel.ForEach(). Also, be aware of potential pitfalls.

阅读关于使用Parallel.ForEach()的教程可能会有所帮助。同时,要注意潜在的隐患。

#3


0  

Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.

听起来PLINQ是最好的解决方案(本文的第5章)。但是,由于每个计算都与数据库有很大的关系,所以应该为每个线程创建单独的会话。

#4


0  

Use IStatelessSessions if possible and experiment with the adonet.batch_size property.

如果可能的话,使用无状态会话,并尝试使用adonet。batch_size财产。

Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better

还需要表现得怎样?我是NH的粉丝,但这是一个存储过程可能更好的场景

#1


2  

You should consider Task Parallel Library.

您应该考虑任务并行库。

#2


2  

Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:

假设您正在使用。net 4.0,您可以使用任务并行库(如前所述)来完成以下操作:

Parallel.ForEach(sourceCollection, item => Process(item));

Your source collection would be an IEnumerable of the loaded records. The library will handle everything for you:

您的源集合将是已加载记录的IEnumerable。图书馆将为您处理一切:

The source collection is partitioned and the work is scheduled on multiple threads based on the system environment. The more processors on the system, the faster the parallel method runs.

源集合被分区,工作基于系统环境调度在多个线程上。系统上的处理器越多,并行方法运行的速度就越快。

It may help to read a tutorial on using Parallel.ForEach(). Also, be aware of potential pitfalls.

阅读关于使用Parallel.ForEach()的教程可能会有所帮助。同时,要注意潜在的隐患。

#3


0  

Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.

听起来PLINQ是最好的解决方案(本文的第5章)。但是,由于每个计算都与数据库有很大的关系,所以应该为每个线程创建单独的会话。

#4


0  

Use IStatelessSessions if possible and experiment with the adonet.batch_size property.

如果可能的话,使用无状态会话,并尝试使用adonet。batch_size财产。

Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better

还需要表现得怎样?我是NH的粉丝,但这是一个存储过程可能更好的场景