When I have to get GBs of data, save it on a collection and process it, I have memory overflows. So instead of:
当我必须获取GB的数据,将其保存在集合上并处理它时,我有内存溢出。所以代替:
public class Program
{
public IEnumerable<SomeClass> GetObjects()
{
var list = new List<SomeClass>();
while( // get implementation
list.Add(object);
}
return list;
}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
var objects = GetObjects();
ProcessObjects(objects);
}
}
I need to:
我需要:
public class Program
{
void ProcessObject(SomeClass object)
{
// process implementation
}
public void GetAndProcessObjects()
{
var list = new List<SomeClass>();
while( // get implementation
Process(object);
}
return list;
}
void Main()
{
var objects = GetAndProcessObjects();
}
}
There is a better way?
有一个更好的方法?
5 个解决方案
#1
You ought to leverage C#'s iterator blocks and use the yield return
statement to do something like this:
您应该利用C#的迭代器块并使用yield return语句执行以下操作:
public class Program
{
public IEnumerable<SomeClass> GetObjects()
{
while( // get implementation
yield return object;
}
}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
var objects = GetObjects();
ProcessObjects(objects);
}
}
This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.
这将允许您流式传输每个对象而不是将整个序列保留在内存中 - 您只需要一次将一个对象保留在内存中。
#2
Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T>
and produce the data on demand, or better, use IQueryable<T>
and have the entire execution of the query deferred until the data are required.
不要使用List,它要求所有数据一次存在于内存中。使用IEnumerable
Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.
或者,不要将数据保留在内存中,而是将数据保存到数据库进行处理。处理完成后,查询数据库以查找结果。
#3
public IEnumerable<SomeClass> GetObjects()
{
foreach( var obj in GetIQueryableObjects
yield return obj
}
#4
You want to yield!
你想要屈服!
Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.
延迟处理您的枚举。构建一个返回IEnumerable的方法,但只使用yield语句一次返回一条记录。
#5
The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :
在这种情况下,最好的方法是获取和处理块。您将不得不通过反复试验找出获取和处理的块数。所以代码将是这样的:
public class Program
{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }
{public IEnumerable GetObjects(int anchor,int chunkSize){var list = new List(); while(//获取给定锚点和chunkSize list.Add(object)的实现;} return list;}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
int chunkSize = 5000;
int totalSize = //Get Total Number of rows;
int anchor = //Get first row to process as anchor;
While (anchor < totalSize)
(
var objects = GetObjects(anchor, chunkSize);
ProcessObjects(objects);
anchor += chunkSize;
}
}
}
#1
You ought to leverage C#'s iterator blocks and use the yield return
statement to do something like this:
您应该利用C#的迭代器块并使用yield return语句执行以下操作:
public class Program
{
public IEnumerable<SomeClass> GetObjects()
{
while( // get implementation
yield return object;
}
}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
var objects = GetObjects();
ProcessObjects(objects);
}
}
This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.
这将允许您流式传输每个对象而不是将整个序列保留在内存中 - 您只需要一次将一个对象保留在内存中。
#2
Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T>
and produce the data on demand, or better, use IQueryable<T>
and have the entire execution of the query deferred until the data are required.
不要使用List,它要求所有数据一次存在于内存中。使用IEnumerable
Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.
或者,不要将数据保留在内存中,而是将数据保存到数据库进行处理。处理完成后,查询数据库以查找结果。
#3
public IEnumerable<SomeClass> GetObjects()
{
foreach( var obj in GetIQueryableObjects
yield return obj
}
#4
You want to yield!
你想要屈服!
Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.
延迟处理您的枚举。构建一个返回IEnumerable的方法,但只使用yield语句一次返回一条记录。
#5
The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :
在这种情况下,最好的方法是获取和处理块。您将不得不通过反复试验找出获取和处理的块数。所以代码将是这样的:
public class Program
{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }
{public IEnumerable GetObjects(int anchor,int chunkSize){var list = new List(); while(//获取给定锚点和chunkSize list.Add(object)的实现;} return list;}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
int chunkSize = 5000;
int totalSize = //Get Total Number of rows;
int anchor = //Get first row to process as anchor;
While (anchor < totalSize)
(
var objects = GetObjects(anchor, chunkSize);
ProcessObjects(objects);
anchor += chunkSize;
}
}
}