
时间:2023-01-19 16:55:12

I've got to download, process, and store an 8GB XML file from a secure web server. I could download the file using the WebRequest class, but this will take a VERY long time. Also, I know that the file is structured in such a way that it suits processing in discrete chunks.


How can I 'stream' this file such that I only get bite-size pieces which I can work on, without having to get the whole stream at one time?



I forgot to mention - we are hosted on Azure. An idea that comes to mind is to provision a worker role which just downloads large files and can take as long as it wants. How feasible would that be?

我忘了提 - 我们托管在Azure上。想到的一个想法是提供一个工作者角色,它只下载大文件并且可以根据需要进行。这有多可行?

4 个解决方案



8 GB is a large workload. To protect myself from rework and to scale effectively, I would decouple the XML file download from it’s processing.

8 GB是一个很大的工作量。为了保护自己免于返工和有效扩展,我会将XML文件下载与其处理分离。

While downloading as a stream, I would write some sort of stream identifier to persistent storage and schedule each atomic unit of work to be done by placing a message with its relevant data on a queue. This would allow recovery from the download going south for any reason or a unit of work being unsuccessful and/or interfering with the download.




I'm using HttpWebRequest, BeginGetResponse then GetResponseStream


Then one can read the stream in chunks as it's dripping down via stream.BeginRead


Here's much too complicated example: http://stuff.seans.com/2009/01/05/using-httpwebrequest-for-asynchronous-downloads/




If you need to process file sequentially just open an XMLReader on the stream of response and read the data as needed.


If you need random access to the file (i.e. read in the middle) you may need to do more work to create seekable stream (if server supports RANGE option in the request) or simply download whole file as you do now.


Please note that 8GB is large amount of data and downloading it completely will take a lot of time irrespective of method of reading.




You could upload the xml file to a block blob and download it from there.This blog post might help http://blogs.msdn.com/b/kwill/archive/2011/05/30/asynchronous-parallel-block-blob-transfers-with-progress-change-notification.aspx

您可以将xml文件上传到块blob并从那里下载。这篇博客文章可能有所帮助http://blogs.msdn.com/b/kwill/archive/2011/05/30/asynchronous-parallel-block-blob -transfers,与正在进行的变化,notification.aspx

Hope this helps.




8 GB is a large workload. To protect myself from rework and to scale effectively, I would decouple the XML file download from it’s processing.

8 GB是一个很大的工作量。为了保护自己免于返工和有效扩展,我会将XML文件下载与其处理分离。

While downloading as a stream, I would write some sort of stream identifier to persistent storage and schedule each atomic unit of work to be done by placing a message with its relevant data on a queue. This would allow recovery from the download going south for any reason or a unit of work being unsuccessful and/or interfering with the download.




I'm using HttpWebRequest, BeginGetResponse then GetResponseStream


Then one can read the stream in chunks as it's dripping down via stream.BeginRead


Here's much too complicated example: http://stuff.seans.com/2009/01/05/using-httpwebrequest-for-asynchronous-downloads/




If you need to process file sequentially just open an XMLReader on the stream of response and read the data as needed.


If you need random access to the file (i.e. read in the middle) you may need to do more work to create seekable stream (if server supports RANGE option in the request) or simply download whole file as you do now.


Please note that 8GB is large amount of data and downloading it completely will take a lot of time irrespective of method of reading.




You could upload the xml file to a block blob and download it from there.This blog post might help http://blogs.msdn.com/b/kwill/archive/2011/05/30/asynchronous-parallel-block-blob-transfers-with-progress-change-notification.aspx

您可以将xml文件上传到块blob并从那里下载。这篇博客文章可能有所帮助http://blogs.msdn.com/b/kwill/archive/2011/05/30/asynchronous-parallel-block-blob -transfers,与正在进行的变化,notification.aspx

Hope this helps.
