I am trying to get the image from the following URL:
我试图从以下网址获取图片:
image_url = http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg?b14316
When I navigate to it in a browser, it sure looks like an image. But I get an error when I try:
当我在浏览器中导航到它时,它看起来就像一个图像。但是当我尝试的时候会出错:
import urllib, cStringIO, PIL
from PIL import Image
img_file = cStringIO.StringIO(urllib.urlopen(image_url).read())
image = Image.open(img_file)
IOError: cannot identify image file
无法识别图像文件。
I have copied hundreds of images this way, so I'm not sure what's special here. Can I get this image?
我用这种方法复制了几百张图片,所以我不确定这里有什么特别的。我能得到这个图像吗?
3 个解决方案
#1
3
The problem exists not in the image.
问题不存在于图像中。
>>> urllib.urlopen(image_url).read()
'\n<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html>\n <head>\n <title>403 You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</title>\n </head>\n <body>\n <h1>Error 403 You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</h1>\n <p>You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</p>\n <h3>Guru Meditation:</h3>\n <p>XID: 1806024796</p>\n <hr>\n <p>Varnish cache server</p>\n </body>\n</html>\n'
Using user agent header will solve the problem.
使用用户代理头将解决这个问题。
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(image_url)
img_file = cStringIO.StringIO(response.read())
image = Image.open(img_file)
#2
4
when I open the file using
当我打开文件使用。
In [3]: f = urllib.urlopen('http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg')
In [9]: f.code
Out[9]: 403
This is not returning an image.
这不是返回图像。
You could try specifing a user-agent header to see if you can trick the server into thinking you are a browser.
您可以尝试使用一个用户代理头来查看是否可以欺骗服务器,使其认为您是一个浏览器。
Using requests
library (because it is easier to send header information)
使用请求库(因为发送头信息更容易)
In [7]: f = requests.get('http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:16.0) Gecko/20100101 Firefox/16.0,gzip(gfe)'})
In [8]: f.status_code
Out[8]: 200
#3
2
To get some image , you can first save the image and then , load it to PIL . for example:
要获得一些图像,首先可以保存图像,然后将其加载到PIL。例如:
import urllib2,PIL
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(), urllib2.HTTPCookieProcessor())
image_content = opener.open("http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg?b14316").read()
opener.close()
save_dir = r"/some/folder/to/save/image.jpg"
f = open(save_dir,'wb')
f.write(image_content)
f.close()
image = Image.open(save_dir)
...
#1
3
The problem exists not in the image.
问题不存在于图像中。
>>> urllib.urlopen(image_url).read()
'\n<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html>\n <head>\n <title>403 You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</title>\n </head>\n <body>\n <h1>Error 403 You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</h1>\n <p>You are banned from this site. Please contact via a different client configuration if you believe that this is a mistake.</p>\n <h3>Guru Meditation:</h3>\n <p>XID: 1806024796</p>\n <hr>\n <p>Varnish cache server</p>\n </body>\n</html>\n'
Using user agent header will solve the problem.
使用用户代理头将解决这个问题。
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(image_url)
img_file = cStringIO.StringIO(response.read())
image = Image.open(img_file)
#2
4
when I open the file using
当我打开文件使用。
In [3]: f = urllib.urlopen('http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg')
In [9]: f.code
Out[9]: 403
This is not returning an image.
这不是返回图像。
You could try specifing a user-agent header to see if you can trick the server into thinking you are a browser.
您可以尝试使用一个用户代理头来查看是否可以欺骗服务器,使其认为您是一个浏览器。
Using requests
library (because it is easier to send header information)
使用请求库(因为发送头信息更容易)
In [7]: f = requests.get('http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:16.0) Gecko/20100101 Firefox/16.0,gzip(gfe)'})
In [8]: f.status_code
Out[8]: 200
#3
2
To get some image , you can first save the image and then , load it to PIL . for example:
要获得一些图像,首先可以保存图像,然后将其加载到PIL。例如:
import urllib2,PIL
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(), urllib2.HTTPCookieProcessor())
image_content = opener.open("http://www.eatwell101.com/wp-content/uploads/2012/11/Potato-Pancakes-recipe.jpg?b14316").read()
opener.close()
save_dir = r"/some/folder/to/save/image.jpg"
f = open(save_dir,'wb')
f.write(image_content)
f.close()
image = Image.open(save_dir)
...