如何在保持树结构的同时使用LINQ(仅限流)过滤深层嵌套XML?

时间:2021-08-31 14:28:10

I would like to know how to stream over a very large, deeply nested, XML Document using LINQ, while streaming it, filter nodes based on some criteria and then write the streamed output to a file, while maintaining the same original structure of the XML.

我想知道如何使用LINQ流式传输一个非常大的,深度嵌套的XML文档,同时对其进行流式传输,根据某些条件过滤节点,然后将流式输出写入文件,同时保持XML的原始结构相同。

This should happen without loading the entire document into memory.

这应该发生而不将整个文档加载到内存中。

Is this possible?

这可能吗?

3 个解决方案

#1


LINQ to XML doesn't support reading in a streaming fashion directly, but I've had success in using an XmlReader, filtering based on that, and then passing it to XElement.Load when I've discovered the subtree I'm interested in. It assumes that the subtree is small enough to fit into memory. When Load returns, the reader will have been moved beyond that subtree, and you can keep going until you find the next relevant subtree, etc.

LINQ to XML不支持直接以流方式读取,但我已成功使用XmlReader,基于此过滤,然后在我发现我感兴趣的子树时将其传递给XElement.Load它假定子树足够小以适应内存。当Load返回时,读者将被移动到该子树之外,并且您可以继续前进,直到找到下一个相关的子树,等等。

See this MSDN blog post for more information and sample code.

有关更多信息和示例代码,请参阅此MSDN博客文章。

(This is what I did with the Stack Overflow data dump, btw :)

(这是我用Stack Overflow数据转储做的,顺便说一下:)

#2


This paper contains the answer to my question:

本文包含我的问题的答案:

http://homepages.cwi.nl/~ralf/api-streaming-xml/

Specifically it shows how to maintain tree structure of an original XML when filtering the results while streaming.

具体来说,它展示了在流式传输过滤结果时如何维护原始XML的树结构。

#3


For XML streaming options, check out the XML Team's discussion of streaming with LINQ to XML starting with http://blogs.msdn.com/xmlteam/archive/2007/03/05/streaming-with-linq-to-xml-part-1.aspx. Realize that it is an early blog series and there were some implementation detail changes made in the final release.

有关XML流选项,请参阅XML Team关于使用LINQ to XML进行流式传输的讨论,首先访问http://blogs.msdn.com/xmlteam/archive/2007/03/05/streaming-with-linq-to-xml-part -1.aspx。意识到这是一个早期的博客系列,并且在最终版本中进行了一些实现细节更改。

#1


LINQ to XML doesn't support reading in a streaming fashion directly, but I've had success in using an XmlReader, filtering based on that, and then passing it to XElement.Load when I've discovered the subtree I'm interested in. It assumes that the subtree is small enough to fit into memory. When Load returns, the reader will have been moved beyond that subtree, and you can keep going until you find the next relevant subtree, etc.

LINQ to XML不支持直接以流方式读取,但我已成功使用XmlReader,基于此过滤,然后在我发现我感兴趣的子树时将其传递给XElement.Load它假定子树足够小以适应内存。当Load返回时,读者将被移动到该子树之外,并且您可以继续前进,直到找到下一个相关的子树,等等。

See this MSDN blog post for more information and sample code.

有关更多信息和示例代码,请参阅此MSDN博客文章。

(This is what I did with the Stack Overflow data dump, btw :)

(这是我用Stack Overflow数据转储做的,顺便说一下:)

#2


This paper contains the answer to my question:

本文包含我的问题的答案:

http://homepages.cwi.nl/~ralf/api-streaming-xml/

Specifically it shows how to maintain tree structure of an original XML when filtering the results while streaming.

具体来说,它展示了在流式传输过滤结果时如何维护原始XML的树结构。

#3


For XML streaming options, check out the XML Team's discussion of streaming with LINQ to XML starting with http://blogs.msdn.com/xmlteam/archive/2007/03/05/streaming-with-linq-to-xml-part-1.aspx. Realize that it is an early blog series and there were some implementation detail changes made in the final release.

有关XML流选项,请参阅XML Team关于使用LINQ to XML进行流式传输的讨论,首先访问http://blogs.msdn.com/xmlteam/archive/2007/03/05/streaming-with-linq-to-xml-part -1.aspx。意识到这是一个早期的博客系列,并且在最终版本中进行了一些实现细节更改。