I'm using BeautifulSoup to extract some text from an HTML but I just can't figure out how to print it properly to the screen (or to a file for that matter).
我正在用漂亮的汤从HTML中提取一些文本,但是我不知道如何正确地把它打印到屏幕上(或者是一个文件)。
Here's how my class containing the text looks like:
下面是我的类包含文本的方式:
class Thread(object):
def __init__(self, title, author, date, content = u""):
self.title = title
self.author = author
self.date = date
self.content = content
self.replies = []
def __unicode__(self):
s = u""
for k, v in self.__dict__.items():
s += u"%s = %s " % (k, v)
return s
def __repr__(self):
return repr(unicode(self))
__str__ = __repr__
When trying to print an instance of Thread
here's what I see on the console:
在尝试打印线程实例时,我在控制台上看到:
~/python-tests $ python test.py
u'date = 21:01 03/02/11 content = author = \u05d3"\u05e8 \u05d9\u05d5\u05e0\u05d9 \u05e1\u05d8\u05d0\u05e0\u05e6\'\u05e1\u05e7\u05d5 replies = [] title = \u05de\u05d1\u05e0\u05d4 \u05d4\u05de\u05d1\u05d7\u05df '
Whatever I try I cannot get the output I'd like (the above text should be Hebrew). My end goal is to serialize Thread
to a file (using json or pickle) and be able to read it back.
无论我尝试什么,我都无法得到我想要的输出(上面的文本应该是希伯来语)。我的最终目标是将线程序列化到一个文件(使用json或pickle),并能够读取它。
I'm running this with Python 2.6.6 on Ubuntu 10.10.
我在Ubuntu 10.10上运行Python 2.6.6。
2 个解决方案
#1
17
To output a Unicode string to a file (or the console) you need to choose a text encoding. In Python the default text encoding is ASCII, but to support Hebrew characters you need to use a different encoding, such as UTF-8:
要将Unicode字符串输出到文件(或控制台),您需要选择一个文本编码。在Python中,默认的文本编码是ASCII,但是为了支持希伯来字符,您需要使用不同的编码,比如UTF-8:
s = unicode(your_object).encode('utf8')
f.write(s)
#2
7
A nice alternative to @mark's answer is to set the environment variable PYTHONIOENCODING=UTF-8
.
@mark的一个不错的替代方案是设置环境变量PYTHONIOENCODING=UTF-8。
c.f. Writing unicode strings via sys.stdout in Python.
通过sys编写unicode字符串。在Python stdout。
#1
17
To output a Unicode string to a file (or the console) you need to choose a text encoding. In Python the default text encoding is ASCII, but to support Hebrew characters you need to use a different encoding, such as UTF-8:
要将Unicode字符串输出到文件(或控制台),您需要选择一个文本编码。在Python中,默认的文本编码是ASCII,但是为了支持希伯来字符,您需要使用不同的编码,比如UTF-8:
s = unicode(your_object).encode('utf8')
f.write(s)
#2
7
A nice alternative to @mark's answer is to set the environment variable PYTHONIOENCODING=UTF-8
.
@mark的一个不错的替代方案是设置环境变量PYTHONIOENCODING=UTF-8。
c.f. Writing unicode strings via sys.stdout in Python.
通过sys编写unicode字符串。在Python stdout。