I am attempting to pretty print an HTML email which I have stored in a variable, but I keep getting an error from BS4 that says it is expecting a string.
我试图打印一个HTML邮件,我已经存储在一个变量中,但我不断收到来自BS4的错误,说它期待一个字符串。
Here is my code:
这是我的代码:
from bs4 import BeautifulSoup
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
username = raw_input('USERNAME (email):')
password = raw_input('PASSWORD: ')
try:
mail.login(username, password)
print "Logged in as %r !" % username
except:
imaplib.error
print "Log in failed."
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
result, data = mail.uid('search', None, '(FROM "tiffany@e.tiffany.com")')
latest_email_uid = data[0].split()[1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
print email_message
html = email_message
soup = BeautifulSoup(html)
print soup.prettify()
Here is the printed HTML email I am working from: http://pastebin.com/qfAHwkdV
这是我正在处理的HTML电子邮件:http://pastebin.com/qfAHwkdV
This is the error I am getting:
这是我得到的错误:
Traceback (most recent call last):
File "tiff.py", line 34, in <module>
soup = BeautifulSoup(html)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/__init__.py", line 169, in __init__
self.builder.prepare_markup(markup, from_encoding))
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/builder/_htmlparser.py", line 139, in prepare_markup
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/dammit.py", line 203, in __init__
self._detectEncoding(markup, is_html)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/dammit.py", line 372, in _detectEncoding
xml_encoding_match = xml_encoding_re.match(xml_data)
TypeError: expected string or buffer
Why am I unable to pss the HTML to a variable to parse with BS4?
为什么我无法将HTML写入变量来解析BS4?
Thanks
1 个解决方案
#1
0
According to the documentation on .message_from_string
, this does not return a string, but a message object. BeautifulSoup()
expects a string (or buffer).
根据.message_from_string上的文档,这不会返回字符串,而是返回消息对象。 BeautifulSoup()需要一个字符串(或缓冲区)。
Perhaps do soup = BeautifulSoup(str(html))
or soup = BeautifulSoup(unicode(html))
也许做汤= BeautifulSoup(str(html))或汤= BeautifulSoup(unicode(html))
#1
0
According to the documentation on .message_from_string
, this does not return a string, but a message object. BeautifulSoup()
expects a string (or buffer).
根据.message_from_string上的文档,这不会返回字符串,而是返回消息对象。 BeautifulSoup()需要一个字符串(或缓冲区)。
Perhaps do soup = BeautifulSoup(str(html))
or soup = BeautifulSoup(unicode(html))
也许做汤= BeautifulSoup(str(html))或汤= BeautifulSoup(unicode(html))