Folks,
伙计们,
Please, what's a good way of writing really big XML documents (upto say 500 MB) in C# .NET 3.5? I've had a bit of search around, and can't seem to find anything which addresses this specific question.
请问,在c# .NET 3.5中编写真正大的XML文档(upto say 500mb)的好方法是什么?我找了一会儿,似乎找不到任何能解决这个问题的方法。
My previous thread (What is the best way to parse (big) XML in C# Code?) covered reading similar magnitude Xml documents... With that solved I need to think about how to write the updated features (http://www.opengeospatial.org/standards/sfa) to an "update.xml" document.
我之前的线程(在c#代码中解析(大)XML的最好方法是什么?)解决了这个问题之后,我需要考虑如何将更新后的特性(http://www.opengeospatial.org/standards/sfa)编写为“更新”。xml文档”。
My ideas: Obviously one big DOM is out, considering the maximum size of the document to be produced. I'm using XSD.EXE to generate binding classes form the schema... which works nicely with the XmlSerializer class, but I think it builds a DOM "under the hood". Is this correct?. I can't hold all the features (upto 50,000 of them) in memory at one time. I need to read a feature form the database, serialize it, and write it to file. So I'm thinking I should use the XmlSerializer to write a "doclet" for each individual feature to the file. I've got no idea (yet) if this is even possible/feasible.
我的想法是:考虑到要生成的文档的最大大小,显然有一个大的DOM已经过时了。我使用XSD。从模式生成绑定类的EXE…它与XmlSerializer类配合得很好,但是我认为它构建了一个DOM。这是正确的吗?。我不能一次记住所有的特性(多达50000个)。我需要从数据库中读取一个特性,将其序列化,并将其写入文件。所以我想我应该使用XmlSerializer为文件中的每个特性编写一个“doclet”。我还不知道这是否可能。
What do you think?
你怎么认为?
Background: I'm porting an old VB6 MapInfo "client plugin" to C#. There is an existing J2EE "update service" (actually just a web-app) which this program (among others) must work with. I can't change the server; unless absapositively necessary; especially of that involves changing the other clients. The server accepts an XML document with a schema which does not specificy any namespaces... ie: there is only default namespace, and everything is in it.
背景:我将一个旧的VB6 MapInfo“客户端插件”移植到c#。有一个现有的J2EE“更新服务”(实际上只是一个web应用程序),这个程序(以及其他程序)必须使用它。我不能改变服务器;除非absapositively必要;尤其是改变其他客户。服务器接受带有模式的XML文档,该模式不指定任何名称空间……ie:只有默认的名称空间,所有内容都在其中。
My experience: I'm pretty much a C# and .NET newbie. I've been programming for about 10 year in various languages including Java, VB, C, and some C++.
我的经验是:我是一个c#和。net新手。我已经用Java、VB、C和一些c++等多种语言进行了大约10年的编程。
Cheers all. Keith.
干杯。基斯。
PS: It's dinner time, so I'll be AWOL for about half an hour.
PS:现在是吃晚饭的时间,所以我大概要离开半个小时。
4 个解决方案
#1
16
For writing large xml, XmlWriter
(directly) is your friend - but it is harder to use. The other option would be to use DOM/object-model approaches and combine them, which is probably doable if you seize control of theXmlWriterSettings
and disable the xml marker, and get rid of the namespace declarations...
对于编写大型xml, XmlWriter(直接)是您的朋友——但它更难使用。另一种选择是使用DOM/object-model方法并将它们组合起来,如果您控制xmlwritersettings并禁用xml标记,并删除名称空间声明,这可能是可行的……
using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
public class Foo {
[XmlAttribute]
public int Id { get; set; }
public string Bar { get; set; }
}
static class Program {
[STAThread]
static void Main() {
using (XmlWriter xw = XmlWriter.Create("out.xml")) {
xw.WriteStartElement("xml");
XmlSerializer ser = new XmlSerializer(typeof(Foo));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("","");
foreach (Foo foo in FooGenerator()) {
ser.Serialize(xw, foo, ns);
}
xw.WriteEndElement();
}
}
// streaming approach; only have the smallest amount of program
// data in memory at once - in this case, only a single `Foo` is
// ever in use at a time
static IEnumerable<Foo> FooGenerator() {
for (int i = 0; i < 40; i++) {
yield return new Foo { Id = i, Bar = "Foo " + i };
}
}
}
#2
9
Use a XmlWriter:
XmlWriter使用:
[...] a writer that provides a fast, non-cached, forward-only means of generating streams or files containing XML data.
[…一种提供快速、非缓存的、仅能生成包含XML数据的流或文件的方法的作者。
#3
1
Did you consider compressing it before writing it to disk? With XML you can reach more than 10 times compressing and even more. it will probably take you less time to compress the file and write the compressed version than to read the whole 500Mb version.
在将它写到磁盘之前,您考虑过压缩它吗?使用XML,您可以达到10倍以上的压缩,甚至更多。与读取整个500Mb版本相比,压缩文件和编写压缩版本可能需要更少的时间。
#4
-1
Why not simply use a TextWriter to write the XML?
为什么不简单地使用一个TextWriter来编写XML呢?
#1
16
For writing large xml, XmlWriter
(directly) is your friend - but it is harder to use. The other option would be to use DOM/object-model approaches and combine them, which is probably doable if you seize control of theXmlWriterSettings
and disable the xml marker, and get rid of the namespace declarations...
对于编写大型xml, XmlWriter(直接)是您的朋友——但它更难使用。另一种选择是使用DOM/object-model方法并将它们组合起来,如果您控制xmlwritersettings并禁用xml标记,并删除名称空间声明,这可能是可行的……
using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
public class Foo {
[XmlAttribute]
public int Id { get; set; }
public string Bar { get; set; }
}
static class Program {
[STAThread]
static void Main() {
using (XmlWriter xw = XmlWriter.Create("out.xml")) {
xw.WriteStartElement("xml");
XmlSerializer ser = new XmlSerializer(typeof(Foo));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("","");
foreach (Foo foo in FooGenerator()) {
ser.Serialize(xw, foo, ns);
}
xw.WriteEndElement();
}
}
// streaming approach; only have the smallest amount of program
// data in memory at once - in this case, only a single `Foo` is
// ever in use at a time
static IEnumerable<Foo> FooGenerator() {
for (int i = 0; i < 40; i++) {
yield return new Foo { Id = i, Bar = "Foo " + i };
}
}
}
#2
9
Use a XmlWriter:
XmlWriter使用:
[...] a writer that provides a fast, non-cached, forward-only means of generating streams or files containing XML data.
[…一种提供快速、非缓存的、仅能生成包含XML数据的流或文件的方法的作者。
#3
1
Did you consider compressing it before writing it to disk? With XML you can reach more than 10 times compressing and even more. it will probably take you less time to compress the file and write the compressed version than to read the whole 500Mb version.
在将它写到磁盘之前,您考虑过压缩它吗?使用XML,您可以达到10倍以上的压缩,甚至更多。与读取整个500Mb版本相比,压缩文件和编写压缩版本可能需要更少的时间。
#4
-1
Why not simply use a TextWriter to write the XML?
为什么不简单地使用一个TextWriter来编写XML呢?