Python之FTP多线程下载文件之多线程分块下载文件
Python中的ftplib模块用于对FTP的相关操作,常见的如下载,上传等。使用python从FTP下载较大的文件时,往往比较耗时,如何提高从FTP下载文件的速度呢?多线程粉墨登场,本文给大家分享我的多线程下载代码,需要用到的python主要模块包括:ftplib和threading。
首先讨论我们的下载思路,示意如下:
1. 将文件分块,比如我们打算采用20个线程去下载同一个文件,则需要将文件以二进制方式打开,平均分成20块,然后分别启用一个线程去下载一个块:
1 def setupThreads(self, filePath, localFilePath, threadNumber = 20):
2 """
3 set up the threads which will be used to download images
4 list of threads will be returned if success, else
5 None will be returned
6 """
7 try:
8 temp = self.ftp.sendcmd('SIZE ' + filePath)
9 remoteFileSize = int(string.split(temp)[1])
10 blockSize = remoteFileSize / threadNumber
11 rest = None
12 threads = []
13 for i in range(0, threadNumber - 1):
14 beginPoint = blockSize * i
15 subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (i, filePath, localFilePath, beginPoint, blockSize, rest,))
16 threads.append(subThread)
17
18 assigned = blockSize * threadNumber
19 unassigned = remoteFileSize - assigned
20 lastBlockSize = blockSize + unassigned
21 beginPoint = blockSize * (threadNumber - 1)
22 subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (threadNumber - 1, filePath, localFilePath, beginPoint, lastBlockSize, rest,))
23 threads.append(subThread)
24 return threads
25 except Exception, diag:
26 self.recordLog(str(diag), 'error')
27 return None
其中的downloadFileMultiThreads函数如下:
1 def downloadFileMultiThreads(self, threadIndex, remoteFilePath, localFilePath, \
2 beginPoint, blockSize, rest = None):
3 """
4 A sub thread used to download file
5 """
6 try:
7 threadName = threading.currentThread().getName()
8 # temp local file
9 fp = open(localFilePath + '.part.' + str(threadIndex), 'wb')
10 callback = fp.write
11
12 # another connection to ftp server, change to path, and set binary mode
13 myFtp = FTP(self.host, self.user, self.passwd)
14 myFtp.cwd(os.path.dirname(remoteFilePath))
15 myFtp.voidcmd('TYPE I')
16
17 finishedSize = 0
18 # where to begin downloading
19 setBeginPoint = 'REST ' + str(beginPoint)
20 myFtp.sendcmd(setBeginPoint)
21 # begin to download
22 beginToDownload = 'RETR ' + os.path.basename(remoteFilePath)
23 connection = myFtp.transfercmd(beginToDownload, rest)
24 readSize = self.fixBlockSize
25 while 1:
26 if blockSize > 0:
27 remainedSize = blockSize - finishedSize
28 if remainedSize > self.fixBlockSize:
29 readSize = self.fixBlockSize
30 else:
31 readSize = remainedSize
32 data = connection.recv(readSize)
33 if not data:
34 break
35 finishedSize = finishedSize + len(data)
36 # make sure the finished data no more than blockSize
37 if finishedSize == blockSize:
38 callback(data)
39 break
40 callback(data)
41 connection.close()
42 fp.close()
43 myFtp.quit()
44 return True
45 except Exception, diag:
46 return False
2. 等待下载完成之后我们需要对各个文件块进行合并,合并的过程见本系列之二:Python之FTP多线程下载文件之分块多线程文件合并
感谢大家的阅读,希望能够帮到大家!
Published by Windows Live Writer!
作者: 薛定谔の喵
出处: http://www.cnblogs.com/berlin-sun/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。