I have several fairly large XML files that represent data exported from a system that is to be used by a 3rd party vendor. I was chopping the results at 2,500 records for each XML file because the files become huge and unmanagable otherwise. However, the 3rd party vendor has asked me to combine all of these XML files into a single file. There are 78 of these XML files and they total over 700MB in size! Crazy, I know... so how would you go about combining these files to accomodate the vendor using C#? Hopefully there is a real efficient way to do this without reading in all of the files at once using LINQ :-)
我有几个相当大的XML文件,它们表示将由第三方供应商使用的系统导出的数据。我对每个XML文件的结果进行了2500个记录,因为这些文件变得庞大而不可管理。然而,第三方供应商要求我将所有这些XML文件合并到一个文件中。有78个这样的XML文件,它们的大小总计超过700MB !疯了,我知道……那么如何结合这些文件来使用c#来满足供应商的需求呢?希望有一种真正有效的方法,不用同时在所有文件中读取LINQ:-)
2 个解决方案
#1
4
I'm going to go out on a limb here and assume that your xml looks something like:
我在这里假设你的xml是这样的
<records>
<record>
<dataPoint1/>
<dataPoint2/>
</record>
</records>
If that's the case, I would open a file stream and write the <records>
part, then sequentially open each XML file and write all lines (except the first and last) to disk. That way you don't have huge strings in memory and it should all be very, very quick to code and run.
如果是这样,我将打开一个文件流并编写
public void ConsolidateFiles(List<String> files, string outputFile)
{
var output = new StreamWriter(File.Open(outputFile, FileMode.Create));
output.WriteLine("<records>");
foreach (var file in files)
{
var input = new StreamReader(File.Open(file, FileMode.Open));
string line;
while (!input.EndOfStream)
{
line = input.ReadLine();
if (!line.Contains("<records>") &&
!line.Contains("</records>"))
{
output.Write(line);
}
}
}
output.WriteLine("</records>");
}
#2
2
Use DataSet.ReadXml()
, DataSet.Merge()
, and DataSet.WriteXml()
. Let the framework do the work for you.
Something like this:
使用DataSet.ReadXml()、DataSet.Merge()和DataSet.WriteXml()。让框架为您完成工作。是这样的:
public void Merge(List<string> xmlFiles, string outputFileName)
{
DataSet complete = new DataSet();
foreach (string xmlFile in xmlFiles)
{
XmlTextReader reader = new XmlTextReader(xmlFile);
DataSet current = new DataSet();
current.ReadXml(reader);
complete.Merge(current);
}
complete.WriteXml(outputFileName);
}
For further description and examples, take a look at this article from Microsoft.
有关进一步的描述和示例,请参阅Microsoft的这篇文章。
#1
4
I'm going to go out on a limb here and assume that your xml looks something like:
我在这里假设你的xml是这样的
<records>
<record>
<dataPoint1/>
<dataPoint2/>
</record>
</records>
If that's the case, I would open a file stream and write the <records>
part, then sequentially open each XML file and write all lines (except the first and last) to disk. That way you don't have huge strings in memory and it should all be very, very quick to code and run.
如果是这样,我将打开一个文件流并编写
public void ConsolidateFiles(List<String> files, string outputFile)
{
var output = new StreamWriter(File.Open(outputFile, FileMode.Create));
output.WriteLine("<records>");
foreach (var file in files)
{
var input = new StreamReader(File.Open(file, FileMode.Open));
string line;
while (!input.EndOfStream)
{
line = input.ReadLine();
if (!line.Contains("<records>") &&
!line.Contains("</records>"))
{
output.Write(line);
}
}
}
output.WriteLine("</records>");
}
#2
2
Use DataSet.ReadXml()
, DataSet.Merge()
, and DataSet.WriteXml()
. Let the framework do the work for you.
Something like this:
使用DataSet.ReadXml()、DataSet.Merge()和DataSet.WriteXml()。让框架为您完成工作。是这样的:
public void Merge(List<string> xmlFiles, string outputFileName)
{
DataSet complete = new DataSet();
foreach (string xmlFile in xmlFiles)
{
XmlTextReader reader = new XmlTextReader(xmlFile);
DataSet current = new DataSet();
current.ReadXml(reader);
complete.Merge(current);
}
complete.WriteXml(outputFileName);
}
For further description and examples, take a look at this article from Microsoft.
有关进一步的描述和示例,请参阅Microsoft的这篇文章。