I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've obviously added to iTunes.
我有一个小的实用程序,我用它从一个网站上下载一个MP3,然后构建/更新一个播客XML文件,我显然已经把它添加到iTunes了。
The text processing that creates/updates the XML file is written in Python. I use wget inside a Windows .bat
file to download the actual MP3 however. I would prefer to have the entire utility written in Python though.
创建/更新XML文件的文本处理是用Python编写的。我使用wget在一个Windows .bat文件下载实际的MP3。我更希望使用Python编写的整个实用程序。
I struggled though to find a way to actually down load the file in Python, thus why I resorted to wget
.
我挣扎着想要找到一种方法来实际卸载Python中的文件,因此我为什么要使用wget。
So, how do I download the file using Python?
那么,如何使用Python下载文件呢?
19 个解决方案
#1
378
In Python 2, use urllib2 which comes with the standard library.
在Python 2中,使用与标准库一起使用的urllib2。
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers. The documentation can be found here.
这是使用库的最基本方法,减去任何错误处理。您还可以做更复杂的事情,比如更改标题。文档可以在这里找到。
#2
915
One more, using urlretrieve
:
一个,使用urlretrieve:
import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
(for Python 3+ use 'import urllib.request' and urllib.request.urlretrieve)
(用于Python 3+使用'导入urllib)。请求和urllib.request.urlretrieve)
Yet another one, with a "progressbar"
还有一个"progressbar"
import urllib2
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
#3
281
In 2012, use the python requests library
在2012年,使用python请求库。
>>> import requests
>>>
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760
You can run pip install requests
to get it.
您可以运行pip安装请求来获取它。
Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.
请求比其他方法有很多优点,因为API要简单得多。如果必须进行身份验证,这一点尤其正确。在这种情况下,urllib和urllib2是非常不直观和痛苦的。
2015-12-30
2015-12-30
People have expressed admiration for the progress bar. It's cool, sure. There are several off-the-shelf solutions now, including tqdm
:
人们对进度条表示钦佩。很酷,当然。现在有一些现成的解决方案,包括tqdm:
from tqdm import tqdm
import requests
url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)
with open("10MB", "wb") as handle:
for data in tqdm(response.iter_content()):
handle.write(data)
This is essentially the implementation @kvance described 30 months ago.
这本质上是@kvance在30个月前描述的实现。
#4
144
import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
output.write(mp3file.read())
The wb
in open('test.mp3','wb')
opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.
打开的wb ('test.mp3','wb')在二进制模式下打开一个文件(并删除任何现有文件),这样你就可以用它来保存数据而不是文本。
#5
53
Python 3
-
urllib.request.urlopen
import urllib.request response = urllib.request.urlopen('http://www.example.com/') html = response.read()
-
urllib.request.urlretrieve
import urllib.request urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
Python 2
-
urllib2.urlopen
(thanks Corey)urllib2。urlopen(由于科里)
import urllib2 response = urllib2.urlopen('http://www.example.com/') html = response.read()
-
urllib.urlretrieve
(thanks PabloG)urllib。urlretrieve(感谢PabloG)
import urllib urllib.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
#6
18
An improved version of the PabloG code for Python 2/3:
Python 2/3的PabloG代码的改进版本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )
import sys, os, tempfile, logging
if sys.version_info >= (3,):
import urllib.request as urllib2
import urllib.parse as urlparse
else:
import urllib2
import urlparse
def download_file(url, dest=None):
"""
Download and save a file specified by url to dest directory,
"""
u = urllib2.urlopen(url)
scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
filename = os.path.basename(path)
if not filename:
filename = 'downloaded.file'
if dest:
filename = os.path.join(dest, filename)
with open(filename, 'wb') as f:
meta = u.info()
meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
meta_length = meta_func("Content-Length")
file_size = None
if meta_length:
file_size = int(meta_length[0])
print("Downloading: {0} Bytes: {1}".format(url, file_size))
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = "{0:16}".format(file_size_dl)
if file_size:
status += " [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
status += chr(13)
print(status, end="")
print()
return filename
if __name__ == "__main__": # Only run if this file is called directly
print("Testing with 10MB download")
url = "http://download.thinkbroadband.com/10MB.zip"
filename = download_file(url)
print(filename)
#7
16
Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve
with these features as of version 2.0.
为了这个目的,用纯Python编写了wget库。它以2.0版本的这些特性为其注入urlretrieve。
#8
14
use wget module:
使用wget模块:
import wget
wget.download('url')
#9
12
I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if you want to do more complex things, but to make the answers more complete, urllib is a simpler module if you want just the basics:
我同意Corey的观点,urllib2比urllib更完整,如果你想做更复杂的事情,可能会用到这个模块,但是为了让答案更完整,urllib是一个更简单的模块,如果你想要基本的:
import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()
Will work fine. Or, if you don't want to deal with the "response" object you can call read() directly:
将正常工作。或者,如果您不想处理“响应”对象,您可以直接调用read():
import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()
#10
11
Following are the most commonly used calls for downloading files in python:
以下是最常用的在python中下载文件的调用:
-
urllib.urlretrieve ('url_to_file', file_name)
urllib。urlretrieve(url_to_file,file_name)
-
urllib2.urlopen('url_to_file')
urllib2.urlopen(“url_to_file”)
-
requests.get(url)
requests.get(url)
-
wget.download('url', file_name)
wget。下载(“url”,file_name)
Note: urlopen
and urlretrieve
are found to perform relatively bad with downloading large files (size > 500 MB). requests.get
stores the file in-memory until download is complete.
注意:urlopen和urlretrieve在下载大型文件(大小为> 500mb)时表现相对较差。请求。在下载完成之前,将文件保存到内存中。
#11
7
Simple yet Python 2 & Python 3
compatible way comes with six
library:
简单的Python 2 & Python 3兼容的方式有6个库:
from six.moves import urllib
urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
#12
6
You can get the progress feedback with urlretrieve as well:
你也可以通过urlretrieve获得进度反馈:
def report(blocknr, blocksize, size):
current = blocknr*blocksize
sys.stdout.write("\r{0:.2f}%".format(100.0*current/size))
def downloadFile(url):
print "\n",url
fname = url.split('/')[-1]
print fname
urllib.urlretrieve(url, fname, report)
#13
5
If you have wget installed, you can use parallel_sync.
如果您已经安装了wget,您可以使用parallel_sync。
pip install parallel_sync
pip安装parallel_sync
from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)
Doc: https://pythonhosted.org/parallel_sync/pages/examples.html
医生:https://pythonhosted.org/parallel_sync/pages/examples.html
This is pretty powerful. It can download files in parallel, retry upon failure , and it can even download files on a remote machine.
这是很强大的。它可以并行下载文件,重试失败,甚至可以在远程机器上下载文件。
#14
2
Source code can be:
源代码可以:
import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()
sock.close()
print htmlSource
#15
2
If speed matters to you, I made a small performance test for the modules urllib
and wget
, and regarding wget
I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.
如果速度对你来说很重要,我对模块urllib和wget做了一个小的性能测试,关于wget我尝试过一次状态栏和一次没有。我使用了3个不同的500MB文件来测试(不同的文件),以消除在后台进行缓存的可能性。在debian机器上测试,使用python2。
First, these are the results (they are similar in different runs):
首先,这些是结果(它们在不同的运行中是相似的):
$ python wget_test.py
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============
The way I performed the test is using "profile" decorator. This is the full code:
我执行测试的方式是使用“profile”decorator。这是完整的代码:
import wget
import urllib
import time
from functools import wraps
def profile(func):
@wraps(func)
def inner(*args):
print func.__name__, ": starting"
start = time.time()
ret = func(*args)
end = time.time()
print func.__name__, ": {:.2f}".format(end - start)
return ret
return inner
url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'
def do_nothing(*args):
pass
@profile
def urlretrive_test(url):
return urllib.urlretrieve(url)
@profile
def wget_no_bar_test(url):
return wget.download(url, out='/tmp/', bar=do_nothing)
@profile
def wget_with_bar_test(url):
return wget.download(url, out='/tmp/')
urlretrive_test(url1)
print '=============='
time.sleep(1)
wget_no_bar_test(url2)
print '=============='
time.sleep(1)
wget_with_bar_test(url3)
print '=============='
time.sleep(1)
urllib
seems to be the fastest
urllib似乎是最快的。
#16
2
In python3 you can use urllib3 and shutil libraires. Download them by using pip or pip3 (Depending whether python3 is default or not)
在python3里,你可以使用urllib3和shutil libraires。使用pip或pip3下载它们(取决于python3是否默认)
pip3 install urllib3 shutil
Then run this code
然后运行这段代码
import urllib.request
import shutil
url = "http://www.somewebsite.com/something.pdf"
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
Note that you download urllib3
but use urllib
in code
请注意,您下载了urllib3,但在代码中使用了urllib。
#17
1
This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to make it look AWESOME! Check it out :
这可能有点晚了,但是我看到了pabloG的代码,并不能添加一个os。system('cls')让它看起来很棒!检查一下:
import urllib2,os
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
os.system('cls')
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
If running in an environment other than Windows, you will have to use something other then 'cls'. In MAC OS X and Linux it should be 'clear'.
如果在Windows之外的环境中运行,您将不得不使用其他的“cls”。在MAC OS X和Linux中,它应该是“清晰的”。
#18
0
urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-par
urlretrieve和请求。得到是简单的,而现实却不是。我已经为一些网站获取了数据,包括文本和图像,上面的两个可能解决了大部分的任务。但对于更普遍的解决方案,我建议使用urlopen。由于它包含在Python 3标准库中,所以您的代码可以在任何运行Python 3的机器上运行,而无需预先安装site-par。
import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length
#remember to open file in bytes mode
with open(filename, 'wb') as f:
while True:
buffer = url_connect.read(buffer_size)
if not buffer: break
#an integer value of size of written data
data_wrote = f.write(buffer)
#you could probably use with-open-as manner
url_connect.close()
This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.
这个答案提供了一个HTTP 403的解决方案,在HTTP上使用Python下载文件。我只尝试了请求和urllib模块,其他模块可能提供更好的东西,但这是我用来解决大部分问题的方法。
#19
0
I wrote the following, which works in vanilla Python 2 or Python 3.
我写了下面的代码,它适用于普通的Python 2或Python 3。
import sys
try:
import urllib.request
python3 = True
except ImportError:
import urllib2
python3 = False
def progress_callback_simple(downloaded,total):
sys.stdout.write(
"\r" +
(len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
" [%3.2f%%]"%(100.0*float(downloaded)/float(total))
)
sys.stdout.flush()
def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):
def _download_helper(response, out_file, file_size):
if progress_callback!=None: progress_callback(0,file_size)
if block_size == None:
buffer = response.read()
out_file.write(buffer)
if progress_callback!=None: progress_callback(file_size,file_size)
else:
file_size_dl = 0
while True:
buffer = response.read(block_size)
if not buffer: break
file_size_dl += len(buffer)
out_file.write(buffer)
if progress_callback!=None: progress_callback(file_size_dl,file_size)
with open(dstfilepath,"wb") as out_file:
if python3:
with urllib.request.urlopen(srcurl) as response:
file_size = int(response.getheader("Content-Length"))
_download_helper(response,out_file,file_size)
else:
response = urllib2.urlopen(srcurl)
meta = response.info()
file_size = int(meta.getheaders("Content-Length")[0])
_download_helper(response,out_file,file_size)
import traceback
try:
download(
"https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",
"output.zip",
progress_callback_simple
)
except:
traceback.print_exc()
input()
Notes:
注:
- Supports a "progress bar" callback.
- 支持“进度条”回调。
- Download is a 4 MB test .zip from my website.
- 下载是一个4 MB的测试。zip从我的网站。
#1
378
In Python 2, use urllib2 which comes with the standard library.
在Python 2中,使用与标准库一起使用的urllib2。
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers. The documentation can be found here.
这是使用库的最基本方法,减去任何错误处理。您还可以做更复杂的事情,比如更改标题。文档可以在这里找到。
#2
915
One more, using urlretrieve
:
一个,使用urlretrieve:
import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
(for Python 3+ use 'import urllib.request' and urllib.request.urlretrieve)
(用于Python 3+使用'导入urllib)。请求和urllib.request.urlretrieve)
Yet another one, with a "progressbar"
还有一个"progressbar"
import urllib2
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
#3
281
In 2012, use the python requests library
在2012年,使用python请求库。
>>> import requests
>>>
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760
You can run pip install requests
to get it.
您可以运行pip安装请求来获取它。
Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.
请求比其他方法有很多优点,因为API要简单得多。如果必须进行身份验证,这一点尤其正确。在这种情况下,urllib和urllib2是非常不直观和痛苦的。
2015-12-30
2015-12-30
People have expressed admiration for the progress bar. It's cool, sure. There are several off-the-shelf solutions now, including tqdm
:
人们对进度条表示钦佩。很酷,当然。现在有一些现成的解决方案,包括tqdm:
from tqdm import tqdm
import requests
url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)
with open("10MB", "wb") as handle:
for data in tqdm(response.iter_content()):
handle.write(data)
This is essentially the implementation @kvance described 30 months ago.
这本质上是@kvance在30个月前描述的实现。
#4
144
import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
output.write(mp3file.read())
The wb
in open('test.mp3','wb')
opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.
打开的wb ('test.mp3','wb')在二进制模式下打开一个文件(并删除任何现有文件),这样你就可以用它来保存数据而不是文本。
#5
53
Python 3
-
urllib.request.urlopen
import urllib.request response = urllib.request.urlopen('http://www.example.com/') html = response.read()
-
urllib.request.urlretrieve
import urllib.request urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
Python 2
-
urllib2.urlopen
(thanks Corey)urllib2。urlopen(由于科里)
import urllib2 response = urllib2.urlopen('http://www.example.com/') html = response.read()
-
urllib.urlretrieve
(thanks PabloG)urllib。urlretrieve(感谢PabloG)
import urllib urllib.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
#6
18
An improved version of the PabloG code for Python 2/3:
Python 2/3的PabloG代码的改进版本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )
import sys, os, tempfile, logging
if sys.version_info >= (3,):
import urllib.request as urllib2
import urllib.parse as urlparse
else:
import urllib2
import urlparse
def download_file(url, dest=None):
"""
Download and save a file specified by url to dest directory,
"""
u = urllib2.urlopen(url)
scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
filename = os.path.basename(path)
if not filename:
filename = 'downloaded.file'
if dest:
filename = os.path.join(dest, filename)
with open(filename, 'wb') as f:
meta = u.info()
meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
meta_length = meta_func("Content-Length")
file_size = None
if meta_length:
file_size = int(meta_length[0])
print("Downloading: {0} Bytes: {1}".format(url, file_size))
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = "{0:16}".format(file_size_dl)
if file_size:
status += " [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
status += chr(13)
print(status, end="")
print()
return filename
if __name__ == "__main__": # Only run if this file is called directly
print("Testing with 10MB download")
url = "http://download.thinkbroadband.com/10MB.zip"
filename = download_file(url)
print(filename)
#7
16
Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve
with these features as of version 2.0.
为了这个目的,用纯Python编写了wget库。它以2.0版本的这些特性为其注入urlretrieve。
#8
14
use wget module:
使用wget模块:
import wget
wget.download('url')
#9
12
I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if you want to do more complex things, but to make the answers more complete, urllib is a simpler module if you want just the basics:
我同意Corey的观点,urllib2比urllib更完整,如果你想做更复杂的事情,可能会用到这个模块,但是为了让答案更完整,urllib是一个更简单的模块,如果你想要基本的:
import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()
Will work fine. Or, if you don't want to deal with the "response" object you can call read() directly:
将正常工作。或者,如果您不想处理“响应”对象,您可以直接调用read():
import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()
#10
11
Following are the most commonly used calls for downloading files in python:
以下是最常用的在python中下载文件的调用:
-
urllib.urlretrieve ('url_to_file', file_name)
urllib。urlretrieve(url_to_file,file_name)
-
urllib2.urlopen('url_to_file')
urllib2.urlopen(“url_to_file”)
-
requests.get(url)
requests.get(url)
-
wget.download('url', file_name)
wget。下载(“url”,file_name)
Note: urlopen
and urlretrieve
are found to perform relatively bad with downloading large files (size > 500 MB). requests.get
stores the file in-memory until download is complete.
注意:urlopen和urlretrieve在下载大型文件(大小为> 500mb)时表现相对较差。请求。在下载完成之前,将文件保存到内存中。
#11
7
Simple yet Python 2 & Python 3
compatible way comes with six
library:
简单的Python 2 & Python 3兼容的方式有6个库:
from six.moves import urllib
urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
#12
6
You can get the progress feedback with urlretrieve as well:
你也可以通过urlretrieve获得进度反馈:
def report(blocknr, blocksize, size):
current = blocknr*blocksize
sys.stdout.write("\r{0:.2f}%".format(100.0*current/size))
def downloadFile(url):
print "\n",url
fname = url.split('/')[-1]
print fname
urllib.urlretrieve(url, fname, report)
#13
5
If you have wget installed, you can use parallel_sync.
如果您已经安装了wget,您可以使用parallel_sync。
pip install parallel_sync
pip安装parallel_sync
from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)
Doc: https://pythonhosted.org/parallel_sync/pages/examples.html
医生:https://pythonhosted.org/parallel_sync/pages/examples.html
This is pretty powerful. It can download files in parallel, retry upon failure , and it can even download files on a remote machine.
这是很强大的。它可以并行下载文件,重试失败,甚至可以在远程机器上下载文件。
#14
2
Source code can be:
源代码可以:
import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()
sock.close()
print htmlSource
#15
2
If speed matters to you, I made a small performance test for the modules urllib
and wget
, and regarding wget
I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.
如果速度对你来说很重要,我对模块urllib和wget做了一个小的性能测试,关于wget我尝试过一次状态栏和一次没有。我使用了3个不同的500MB文件来测试(不同的文件),以消除在后台进行缓存的可能性。在debian机器上测试,使用python2。
First, these are the results (they are similar in different runs):
首先,这些是结果(它们在不同的运行中是相似的):
$ python wget_test.py
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49
==============
The way I performed the test is using "profile" decorator. This is the full code:
我执行测试的方式是使用“profile”decorator。这是完整的代码:
import wget
import urllib
import time
from functools import wraps
def profile(func):
@wraps(func)
def inner(*args):
print func.__name__, ": starting"
start = time.time()
ret = func(*args)
end = time.time()
print func.__name__, ": {:.2f}".format(end - start)
return ret
return inner
url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'
def do_nothing(*args):
pass
@profile
def urlretrive_test(url):
return urllib.urlretrieve(url)
@profile
def wget_no_bar_test(url):
return wget.download(url, out='/tmp/', bar=do_nothing)
@profile
def wget_with_bar_test(url):
return wget.download(url, out='/tmp/')
urlretrive_test(url1)
print '=============='
time.sleep(1)
wget_no_bar_test(url2)
print '=============='
time.sleep(1)
wget_with_bar_test(url3)
print '=============='
time.sleep(1)
urllib
seems to be the fastest
urllib似乎是最快的。
#16
2
In python3 you can use urllib3 and shutil libraires. Download them by using pip or pip3 (Depending whether python3 is default or not)
在python3里,你可以使用urllib3和shutil libraires。使用pip或pip3下载它们(取决于python3是否默认)
pip3 install urllib3 shutil
Then run this code
然后运行这段代码
import urllib.request
import shutil
url = "http://www.somewebsite.com/something.pdf"
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
Note that you download urllib3
but use urllib
in code
请注意,您下载了urllib3,但在代码中使用了urllib。
#17
1
This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to make it look AWESOME! Check it out :
这可能有点晚了,但是我看到了pabloG的代码,并不能添加一个os。system('cls')让它看起来很棒!检查一下:
import urllib2,os
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
os.system('cls')
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
If running in an environment other than Windows, you will have to use something other then 'cls'. In MAC OS X and Linux it should be 'clear'.
如果在Windows之外的环境中运行,您将不得不使用其他的“cls”。在MAC OS X和Linux中,它应该是“清晰的”。
#18
0
urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-par
urlretrieve和请求。得到是简单的,而现实却不是。我已经为一些网站获取了数据,包括文本和图像,上面的两个可能解决了大部分的任务。但对于更普遍的解决方案,我建议使用urlopen。由于它包含在Python 3标准库中,所以您的代码可以在任何运行Python 3的机器上运行,而无需预先安装site-par。
import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length
#remember to open file in bytes mode
with open(filename, 'wb') as f:
while True:
buffer = url_connect.read(buffer_size)
if not buffer: break
#an integer value of size of written data
data_wrote = f.write(buffer)
#you could probably use with-open-as manner
url_connect.close()
This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.
这个答案提供了一个HTTP 403的解决方案,在HTTP上使用Python下载文件。我只尝试了请求和urllib模块,其他模块可能提供更好的东西,但这是我用来解决大部分问题的方法。
#19
0
I wrote the following, which works in vanilla Python 2 or Python 3.
我写了下面的代码,它适用于普通的Python 2或Python 3。
import sys
try:
import urllib.request
python3 = True
except ImportError:
import urllib2
python3 = False
def progress_callback_simple(downloaded,total):
sys.stdout.write(
"\r" +
(len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
" [%3.2f%%]"%(100.0*float(downloaded)/float(total))
)
sys.stdout.flush()
def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):
def _download_helper(response, out_file, file_size):
if progress_callback!=None: progress_callback(0,file_size)
if block_size == None:
buffer = response.read()
out_file.write(buffer)
if progress_callback!=None: progress_callback(file_size,file_size)
else:
file_size_dl = 0
while True:
buffer = response.read(block_size)
if not buffer: break
file_size_dl += len(buffer)
out_file.write(buffer)
if progress_callback!=None: progress_callback(file_size_dl,file_size)
with open(dstfilepath,"wb") as out_file:
if python3:
with urllib.request.urlopen(srcurl) as response:
file_size = int(response.getheader("Content-Length"))
_download_helper(response,out_file,file_size)
else:
response = urllib2.urlopen(srcurl)
meta = response.info()
file_size = int(meta.getheaders("Content-Length")[0])
_download_helper(response,out_file,file_size)
import traceback
try:
download(
"https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",
"output.zip",
progress_callback_simple
)
except:
traceback.print_exc()
input()
Notes:
注:
- Supports a "progress bar" callback.
- 支持“进度条”回调。
- Download is a 4 MB test .zip from my website.
- 下载是一个4 MB的测试。zip从我的网站。