在处理大数据集之前,哪种方法可以提高内存使用率? (。净)

时间:2021-08-20 03:53:51

When I have to get GBs of data, save it on a collection and process it, I have memory overflows. So instead of:

当我必须获取GB的数据,将其保存在集合上并处理它时,我有内存溢出。所以代替:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             list.Add(object);
         }
         return list;
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

I need to:

我需要:

 public class Program
 {
     void ProcessObject(SomeClass object)
     {
         // process implementation
     }

     public void GetAndProcessObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             Process(object);
         }
         return list;
     }

     void Main()
     {
         var objects = GetAndProcessObjects();
     }
 }

There is a better way?

有一个更好的方法?

5 个解决方案

#1


You ought to leverage C#'s iterator blocks and use the yield return statement to do something like this:

您应该利用C#的迭代器块并使用yield return语句执行以下操作:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         while( // get implementation
             yield return object;
         }
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.

这将允许您流式传输每个对象而不是将整个序列保留在内存中 - 您只需要一次将一个对象保留在内存中。

#2


Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T> and produce the data on demand, or better, use IQueryable<T> and have the entire execution of the query deferred until the data are required.

不要使用List,它要求所有数据一次存在于内存中。使用IEnumerable 并按需生成数据,或者更好,使用IQueryable 并将查询的整个执行推迟到需要数据。

Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.

或者,不要将数据保留在内存中,而是将数据保存到数据库进行处理。处理完成后,查询数据库以查找结果。

#3


public IEnumerable<SomeClass> GetObjects()
     {

       foreach( var obj in GetIQueryableObjects
             yield return obj
     }

#4


You want to yield!

你想要屈服!

Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.

延迟处理您的枚举。构建一个返回IEnumerable的方法,但只使用yield语句一次返回一条记录。

#5


The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :

在这种情况下,最好的方法是获取和处理块。您将不得不通过反复试验找出获取和处理的块数。所以代码将是这样的:

public class Program

{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }

{public IEnumerable GetObjects(int anchor,int chunkSize){var list = new List(); while(//获取给定锚点和chunkSize list.Add(object)的实现;} return list;}

 public void ProcessObjects(IEnumerable<SomeClass> objects)
 {
     foreach(var object in objects)
         // process implementation
 }

 void Main()
 {
     int chunkSize = 5000;
     int totalSize = //Get Total Number of rows;
     int anchor = //Get first row to process as anchor;
     While (anchor < totalSize)
     (
         var objects = GetObjects(anchor, chunkSize);
         ProcessObjects(objects);
         anchor += chunkSize;
     }
 }

}

#1


You ought to leverage C#'s iterator blocks and use the yield return statement to do something like this:

您应该利用C#的迭代器块并使用yield return语句执行以下操作:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         while( // get implementation
             yield return object;
         }
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.

这将允许您流式传输每个对象而不是将整个序列保留在内存中 - 您只需要一次将一个对象保留在内存中。

#2


Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T> and produce the data on demand, or better, use IQueryable<T> and have the entire execution of the query deferred until the data are required.

不要使用List,它要求所有数据一次存在于内存中。使用IEnumerable 并按需生成数据,或者更好,使用IQueryable 并将查询的整个执行推迟到需要数据。

Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.

或者,不要将数据保留在内存中,而是将数据保存到数据库进行处理。处理完成后,查询数据库以查找结果。

#3


public IEnumerable<SomeClass> GetObjects()
     {

       foreach( var obj in GetIQueryableObjects
             yield return obj
     }

#4


You want to yield!

你想要屈服!

Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.

延迟处理您的枚举。构建一个返回IEnumerable的方法,但只使用yield语句一次返回一条记录。

#5


The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :

在这种情况下,最好的方法是获取和处理块。您将不得不通过反复试验找出获取和处理的块数。所以代码将是这样的:

public class Program

{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }

{public IEnumerable GetObjects(int anchor,int chunkSize){var list = new List(); while(//获取给定锚点和chunkSize list.Add(object)的实现;} return list;}

 public void ProcessObjects(IEnumerable<SomeClass> objects)
 {
     foreach(var object in objects)
         // process implementation
 }

 void Main()
 {
     int chunkSize = 5000;
     int totalSize = //Get Total Number of rows;
     int anchor = //Get first row to process as anchor;
     While (anchor < totalSize)
     (
         var objects = GetObjects(anchor, chunkSize);
         ProcessObjects(objects);
         anchor += chunkSize;
     }
 }

}