We are using XML file to save our data which is all double. Since our data is very huge, in the range of GBs therefore we are converting it to byte in order to save disk space and access times. Also, we are writing our data in chunks of few MBs and that too the new chunk of data can come from same source as its previous one or from a different source. For each new source we have created an element in the XML file in order to identify the data from that source easily in future. The problem that we are facing is that, how to identify the element corresponding to a source in XML file and then append the data corresponding to that source to the data already captured in the XML file. I am using Linq to XML and was not able to google any solution using this method. I tried XMLWriter class but the problem using that is how to identify or reach to the element for which I want to write the chunk.
我们使用XML文件来保存数据,这些数据都是双精度的。由于我们的数据非常庞大,因此在GBs的范围内,我们将其转换为字节,以节省磁盘空间和访问时间。另外,我们将数据写入少量的MBs中,而且新的数据块也可以来自同一来源,就像以前的数据来源一样。对于每个新源,我们都在XML文件中创建了一个元素,以便将来很容易地识别来自该源的数据。我们面临的问题是,如何识别XML文件中与源对应的元素,然后将与源对应的数据附加到XML文件中已经捕获的数据。我正在使用Linq到XML,不能使用这种方法谷歌任何解决方案。我尝试了XMLWriter类,但是使用它的问题是如何识别或到达我想要写入块的元素。
2 个解决方案
#1
1
XML is not a good format for writing large amounts of binary data (due to need to store binary data as Base64 string or some other string-safe encoding) and also not good for updating chunks of data in the middle of large document. I'd recommend to reconsider your file format.
XML不是编写大量二进制数据的良好格式(由于需要将二进制数据存储为Base64字符串或其他字符串安全编码),也不利于在大型文档中更新数据块。我建议重新考虑你的文件格式。
If you have to go with XML:
如果必须使用XML:
- make sure your byte arrays are Base64 encoded when written to XML
- 确保您的字节数组在写入XML时是Base64编码的
- you have to copy XML when you want to insert data in the middle. Consider using XmlReader and XmlWriter. Copy source XML up to the point where you want to add data, add data to output writer, and than finish copying remaining portion of XML.
- 在中间插入数据时,必须复制XML。考虑使用XmlReader和XmlWriter。将源XML复制到您想要添加数据的地方,将数据添加到输出写入器中,并完成对XML的其余部分的复制。
- avoid loading whole XML in memory as it lilky will cause problmes with your GB range of data.
- 避免在内存中加载整个XML,因为这样做会导致GB范围内的数据出现问题。
#2
0
I'm sure I don't have the full picture here, but it is hard to understand why you don't use a database for this. Nevertheless, to follow up on Alexei's post, here is a contrived example of how you could use XmlReader and XmlWriter to accomplish what I think you want to do:
我确信这里没有完整的内容,但是很难理解为什么不使用数据库。不过,为了跟进Alexei的文章,这里有一个设计好的示例,说明如何使用XmlReader和XmlWriter来完成我认为您想做的事情:
//start with some dummy data
string bigData = "<bigdata><rec id='1'>1234</rec><rec id='2'>2468</rec></bigdata>";
string criterion = "2";
string append = "10";
string newValue = "";
bool match = false;
StringBuilder sb = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(sb))
{
using (XmlReader reader = XmlReader.Create(new StringReader(bigData)))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.LocalName == "rec")
{
match = reader.GetAttribute("id").ToString() == criterion;
}
writer.WriteStartElement(reader.LocalName);
writer.WriteAttributes(reader, true);
if (reader.IsEmptyElement)
{
writer.WriteEndElement();
}
break;
case XmlNodeType.Text: // do the append here
newValue = match ? reader.Value + append : reader.Value;
writer.WriteString(newValue);
break;
//other cases based on node types
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
}
writer.Flush();
string x = sb.ToString();//output
}
}
#1
1
XML is not a good format for writing large amounts of binary data (due to need to store binary data as Base64 string or some other string-safe encoding) and also not good for updating chunks of data in the middle of large document. I'd recommend to reconsider your file format.
XML不是编写大量二进制数据的良好格式(由于需要将二进制数据存储为Base64字符串或其他字符串安全编码),也不利于在大型文档中更新数据块。我建议重新考虑你的文件格式。
If you have to go with XML:
如果必须使用XML:
- make sure your byte arrays are Base64 encoded when written to XML
- 确保您的字节数组在写入XML时是Base64编码的
- you have to copy XML when you want to insert data in the middle. Consider using XmlReader and XmlWriter. Copy source XML up to the point where you want to add data, add data to output writer, and than finish copying remaining portion of XML.
- 在中间插入数据时,必须复制XML。考虑使用XmlReader和XmlWriter。将源XML复制到您想要添加数据的地方,将数据添加到输出写入器中,并完成对XML的其余部分的复制。
- avoid loading whole XML in memory as it lilky will cause problmes with your GB range of data.
- 避免在内存中加载整个XML,因为这样做会导致GB范围内的数据出现问题。
#2
0
I'm sure I don't have the full picture here, but it is hard to understand why you don't use a database for this. Nevertheless, to follow up on Alexei's post, here is a contrived example of how you could use XmlReader and XmlWriter to accomplish what I think you want to do:
我确信这里没有完整的内容,但是很难理解为什么不使用数据库。不过,为了跟进Alexei的文章,这里有一个设计好的示例,说明如何使用XmlReader和XmlWriter来完成我认为您想做的事情:
//start with some dummy data
string bigData = "<bigdata><rec id='1'>1234</rec><rec id='2'>2468</rec></bigdata>";
string criterion = "2";
string append = "10";
string newValue = "";
bool match = false;
StringBuilder sb = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(sb))
{
using (XmlReader reader = XmlReader.Create(new StringReader(bigData)))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.LocalName == "rec")
{
match = reader.GetAttribute("id").ToString() == criterion;
}
writer.WriteStartElement(reader.LocalName);
writer.WriteAttributes(reader, true);
if (reader.IsEmptyElement)
{
writer.WriteEndElement();
}
break;
case XmlNodeType.Text: // do the append here
newValue = match ? reader.Value + append : reader.Value;
writer.WriteString(newValue);
break;
//other cases based on node types
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
}
writer.Flush();
string x = sb.ToString();//output
}
}