WebRequest无法正确下载大文件（~1 GB）

I am attempting to download a large file from a public URL. It seemed to work fine at first but 1 / 10 computers seem to timeout. My initial attempt was to use WebClient.DownloadFileAsync but because it would never complete I fell back to using WebRequest.Create and read the response streams directly.

我试图从公共URL下载一个大文件。它起初似乎运行良好，但1/10计算机似乎超时。我最初的尝试是使用WebClient.DownloadFileAsync，但因为它永远不会完成我回到使用WebRequest.Create并直接读取响应流。

My first version of using WebRequest.Create found the same problem as WebClient.DownloadFileAsync. The operation times out and the file does not complete.

我使用WebRequest.Create的第一个版本发现了与WebClient.DownloadFileAsync相同的问题。操作超时，文件未完成。

My next version added retries if the download times out. Here is were it gets weird. The download does eventually finish with 1 retry to finish up the last 7092 bytes. So the file is downloaded with exactly the same size BUT the file is corrupt and differs from the source file. Now I would expect the corruption to be in the last 7092 bytes but this is not the case.

如果下载超时，我的下一个版本会添加重试次数。这是奇怪的。下载最终完成1次重试以完成最后7092字节。因此，文件的下载大小完全相同但文件已损坏且与源文件不同。现在我希望腐败在最后7092字节，但事实并非如此。

Using BeyondCompare I have found that there are 2 chunks of bytes missing from the corrupt file totalling up to the missing 7092 bytes! This missing bytes are at 1CA49FF0 and 1E31F380, way way before the download times out and is restarted.

使用BeyondCompare我发现损坏的文件中缺少2个字节块，总计缺少7092个字节！丢失的字节位于1CA49FF0和1E31F380，这是下载超时并重新启动之前的方式。

What could possibly be going on here? Any hints on how to track down this problem further?

这可能会发生什么？有关如何进一步追踪此问题的任何提示？

Here is the code in question.

这是有问题的代码。

public void DownloadFile(string sourceUri, string destinationPath)
{
    //roughly based on: http://*.com/questions/2269607/how-to-programmatically-download-a-large-file-in-c-sharp
    //not using WebClient.DownloadFileAsync as it seems to stall out on large files rarely for unknown reasons.

    using (var fileStream = File.Open(destinationPath, FileMode.Create, FileAccess.Write, FileShare.Read))
    {
        long totalBytesToReceive = 0;
        long totalBytesReceived = 0;
        int attemptCount = 0;
        bool isFinished = false;

        while (!isFinished)
        {
            attemptCount += 1;

            if (attemptCount > 10)
            {
                throw new InvalidOperationException("Too many attempts to download. Aborting.");
            }

            try
            {
                var request = (HttpWebRequest)WebRequest.Create(sourceUri);

                request.Proxy = null;//http://*.com/questions/754333/why-is-this-webrequest-code-slow/935728#935728
                _log.AddInformation("Request #{0}.", attemptCount);

                //continue downloading from last attempt.
                if (totalBytesReceived != 0)
                {
                    _log.AddInformation("Request resuming with range: {0} , {1}", totalBytesReceived, totalBytesToReceive);
                    request.AddRange(totalBytesReceived, totalBytesToReceive);
                }

                using (var response = request.GetResponse())
                {
                    _log.AddInformation("Received response. ContentLength={0} , ContentType={1}", response.ContentLength, response.ContentType);

                    if (totalBytesToReceive == 0)
                    {
                        totalBytesToReceive = response.ContentLength;
                    }

                    using (var responseStream = response.GetResponseStream())
                    {
                        _log.AddInformation("Beginning read of response stream.");
                        var buffer = new byte[4096];
                        int bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        while (bytesRead > 0)
                        {
                            fileStream.Write(buffer, 0, bytesRead);
                            totalBytesReceived += bytesRead;
                            bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        }

                        _log.AddInformation("Finished read of response stream.");
                    }
                }

                _log.AddInformation("Finished downloading file.");
                isFinished = true;
            }
            catch (Exception ex)
            {
                _log.AddInformation("Response raised exception ({0}). {1}", ex.GetType(), ex.Message);
            }
        }
    }
}

Here is the log output from the corrupt download:

以下是损坏下载的日志输出：

Request #1.
Received response. ContentLength=939302925 , ContentType=application/zip
Beginning read of response stream.
Response raised exception (System.Net.WebException). The operation has timed out.
Request #2.
Request resuming with range: 939295833 , 939302925
Received response. ContentLength=7092 , ContentType=application/zip
Beginning read of response stream.
Finished read of response stream.
Finished downloading file.

4 个解决方案

#1

this is the method I usually use, it hasn't failed me so far for the same kind of loading you need. Try using my code to change yours up a bit and see if that helps.

这是我通常使用的方法，到目前为止，它并没有让我失望，因为你需要同样的加载。尝试使用我的代码来改变你的一些，看看是否有帮助。

if (!Directory.Exists(localFolder))
{
    Directory.CreateDirectory(localFolder);   
}


try
{
    HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(Path.Combine(uri, filename));
    httpRequest.Method = "GET";

    // if the URI doesn't exist, exception gets thrown here...
    using (HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse())
    {
        using (Stream responseStream = httpResponse.GetResponseStream())
        {
            using (FileStream localFileStream = 
                new FileStream(Path.Combine(localFolder, filename), FileMode.Create))
            {
                var buffer = new byte[4096];
                long totalBytesRead = 0;
                int bytesRead;

                while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    totalBytesRead += bytesRead;
                    localFileStream.Write(buffer, 0, bytesRead);
                }
            }
        }
    }
}
catch (Exception ex)
{        
    throw;
}

#2

You should change the timeout settings. There seem to be two possible timeout issues:

您应该更改超时设置。似乎有两种可能的超时问题：

Client-side timeout - try changing the timeouts in WebClient. I find for large file downloads sometimes I need to do that.
客户端超时 - 尝试更改WebClient中的超时。我发现大文件下载有时我需要这样做。
Server-side timeout - try changing the timeout on the server. You can validate this is the problem using another client, e.g. PostMan
服务器端超时 - 尝试更改服务器上的超时。您可以使用其他客户端验证这是问题，例如邮差

#3

For me your method on how to read the file by buffering looks very weird. Maybe the problem is, that you do

对我来说，如何通过缓冲读取文件的方法看起来很奇怪。也许问题是，你这样做

while(bytesRead > 0)

What if, for some reason, the stream doesnt return any bytes at some point but it is still not yet finished downloading, then it would exit the loop and never come back. You should get the Content-Length, and increment a variable totalBytesReceived by bytesRead. Finally you change the loop to

如果由于某种原因，流在某些时候没有返回任何字节但仍未完成下载，那么它将退出循环并且永远不会返回。您应该获取Content-Length，并通过bytesRead递增变量totalBytesReceived。最后你将循环更改为

while(totalBytesReceived < ContentLength)

#4

Allocate buffer size bigger than expected file size .

分配缓冲区大小大于预期的文件大小。

byte[] byteBuffer = new byte[65536];

byte [] byteBuffer = new byte [65536];

so that , if the file is 1GiB in size, you allocate a 1 GiB buffer, and then you try to fill the whole buffer in one call. This filling may return fewer bytes but you've still allocated the whole buffer. Note that the maximum length of a single array in .NET is a 32-bit number which means that even if you recompile your program for 64bit and actually have enough memory available.

因此，如果文件大小为1GiB，则分配1 GiB缓冲区，然后尝试在一次调用中填充整个缓冲区。此填充可能返回更少的字节，但您仍然分配了整个缓冲区。请注意，.NET中单个数组的最大长度是32位数，这意味着即使您重新编译64位程序并且实际上有足够的可用内存。

#1