python 3.2 UnicodeEncodeError:“charmap”codec不能在位置9629中编码字符“\u2013”:字符映射到。

时间:2021-08-07 18:11:51

I'm trying to make a script that gets data out from an sqlite3 database, but I have run in to a problem.

我正在尝试创建一个从sqlite3数据库中获取数据的脚本,但是我遇到了一个问题。

The field in the database is of type text and the contains a html formated text. see the text below

数据库中的字段是文本类型,其中包含html格式文本。请参阅下面的文本

<html>
<head>
<title>Yahoo!</title>
</head>
<body>
<style type="text/css">
html {}
.yshortcuts {border-bottom:none !important;}
.ReadMsgBody {width:100%;}
.ExternalClass{width:100%;}
</style>
<table cellpadding="0" cellspacing="0" bgcolor="#ffffff">    
<tr>
<td width="550" valign="top" align="left">

    <table cellpadding="0" cellspacing="0" width="500">
        <tr>
            <td colspan="3"><img        src="http://mail.yimg.com/nq/assets/sharedmessages/v1/us/logo.gif" width="292" height="51" style="display:block;" border="0" alt="Yahoo! Mail"></td>
        </tr>
        <tr>
            <td rowspan="3" width="1" bgcolor="#c7c4ca"></td>
            <td width="498" height="1" bgcolor="#c7c4ca"></td>
            <td rowspan="3" width="1" bgcolor="#c7c4ca"></td>
        </tr>
        <tr>
            <td width="498" valign="top" align="left">
            <table cellpadding="0" cellspacing="0">
                <tr>
                    <td width="498" bgcolor="#61399d" align="left" valign="top">
                    <table cellspacing="0" cellpadding="0"><tr><td height="24"></td></tr></table>
                    <div style="font-family:Arial, Helvetica, sans-serif;font-size:23px;line-height:27px;margin-bottom:10px;color:#ffffff;margin-left:15px;"><span style="color:#ffffff;text-decoration:none;font-weight:bold;line-height:27px;">Välkommen till Yahoo! Mail.</span></div>
                    <div style="font-family:Arial, Helvetica, sans-serif;font-size:22px;line-height:26px;margin-bottom:1px;color:#ffffff;margin-left:15px;margin-bottom:7px;margin-right:15px;">Ansluta och dela går snabbt och enkelt och är tillgängligt överallt.</div>
                    </td>
                </tr>
                <tr>
                    <td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b1.gif" width="498" height="18" style="display:block;" border="0"></td>
                </tr>
            </table>
            <table cellpadding="0" cellspacing="0" width="498">
                <tr>
                    <td width="292" valign="top">
                    <table cellpadding="0" cellspacing="0">
                        <tr>
                            <td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/grad.gif" width="292" height="9" style="display:block;"></td>
                        </tr>
                        <tr>
                            <td width="292" bgcolor="#ffffff" align="left" valign="top">
                            <table cellspacing="0" cellpadding="0"><tr><td height="11"></td></tr></table>
                            <div style="margin-left:15px;">                  
                                <div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:18px;color:#333333;margin-bottom:11px;font-weight:bold;">Det är lätt som en plätt att komma igång.</div>
                                <table cellpadding="0" cellspacing="0" width="267">
                                    <tr>
                                        <td width="16" align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">1. </div></td>
                                        <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://us-mg999.mail.yahoo.com/neo/launch?action=contacts" style="text-decoration:underline;color:#61399d;"><span>Lägg till alla dina kontakter på en plats</span></a>.</div></td>
                                    </tr>
                                    <tr>
                                        <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">2. </div></td>
                                        <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://mrd.mail.yahoo.com/themes" style="text-decoration:underline;color:#61399d;"><span>Anpassa din nya inkorg</span></a>.</div></td>
                                    </tr>
                                    <tr>
                                        <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">3. </div></td>
                                        <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;"><a rel="nofollow" target="_blank" href="http://se.overview.mail.yahoo.com/mobile" style="text-decoration:underline;color:#61399d;"><span>Anslut överallt på dina mobila enheter</span></a>.</div></td>
                                    </tr>
                                </table>

                            </div>
                            </td>
                        </tr>
                        <tr><td height="13"></td></tr>
                    </table>
                    </td>
                    <td width="196" valign="top">
                    <table cellpadding="0" cellspacing="0">
                        <tr>
                            <td width="1" bgcolor="#fbfbfd" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g1.gif" width="1" height="21" style="display:block;"></td>
                            <td width="1" bgcolor="#f5f6fa" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g2.gif" width="1" height="21" style="display:block;"></td>
                            <td width="1" bgcolor="#e8eaf1" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g3.gif" width="1" height="21" style="display:block;"></td>
                            <td width="1" bgcolor="#d4d4d4"></td>
                            <td width="186" bgcolor="#f0f0f0" align="left" valign="top">  
                            <table cellspacing="0" cellpadding="0"><tr><td height="3">   </td></tr></table>
                            <div style="margin-left:11px;">
                            <div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#333333;margin-bottom:9px;"><b>Info för dig:</b></div>
                            <div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;margin-bottom:10px;">
                            Yahoo!-ID och e-postadress:<br />
                            <div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;">
                            Håll ditt konto och inställningar aktuella. <br><a rel="nofollow" target="_blank" href="https://edit.yahoo.com/config/eval_profile" style="text-decoration:underline;color:#61399d;"><span>Mitt konto</span></a> 
                            </div>
                            </div>
                            <table cellspacing="0" cellpadding="0"><tr><td height="20"></td></tr></table>
                            </td>
                            <td width="1" bgcolor="#dbdbdb"></td>
                            <td width="1" bgcolor="#ced2de"></td>
                            <td width="1" bgcolor="#dbdfed"></td>
                            <td width="1" bgcolor="#e8ebf3"></td>
                            <td width="1" bgcolor="#f3f4f9"></td>
                            <td width="1" bgcolor="#fafbfc"></td>
                        </tr>
                        <tr>
                            <td colspan="11"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b2.gif" width="196" height="8" style="display:block;" border="0"></td>
                        </tr>
                        <tr><td height="13"></td></tr>
                    </table>
                    </td>
                    <td width="10"></td>
                </tr>
            </table>
            </td>
        </tr>
        <tr>
            <td width="498" height="1" bgcolor="#c7c4ca"></td>
        </tr>
    </table>
    <table cellpadding="0" cellspacing="0" width="500">
        <tr>
            <td align="center" valign="top">
            <table cellspacing="0" cellpadding="0"><tr><td height="10"></td></tr></table>
                <div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:18px;margin-bottom:10px;">
                <a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/utos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/mail/atos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Mail –Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/privacy/se/yahoo/details.html" style="text-decoration:underline;color:#61399d;">Yahoo! Sekretesspolicy</a>
                </div>
            </td>
        </tr>
        <tr>
            <td align="left" valign="top">
                <div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:14px;color:#545454;margin-left:16px;margin-right:14px;">Var god svara inte på detta meddelande. Detta är ett servicemeddelande som rör din användning av Yahoo! Mail. Om du vill veta mer om Yahoo!s användning av personlig information, inklusive användning av webb-beacons i HTML-baserad e-post, kan du läsa vår Yahoo! Sekretesspolicy. Yahoo!s adress är 701 First Avenue, Sunnyvale, CA 94089, USA.<br /><br />RefID: lp-1037111</div>
            </td>
        </tr>
    </table>





    </td>
</tr>
</table>
<img width="1" height="1" src="http://pclick.internal.yahoo.com/p/s=2143684696">
</body>
</html>`

and the python code that try to extract the data is as follows.

而试图提取数据的python代码如下所示。

>>> import sqlite3
>>> conn = sqlite3.connect('C:/temp/Mobils/export/com.yahoo.mobile.client.android.mail/databases/mail.db')
>>> c = conn.cursor()
>>> conn.row_factory=sqlite3.Row
>>> c.execute('select body from messages_1 where _id=7')
<sqlite3.Cursor object at 0x0000000001FB78F0>
>>> r = c.fetchone()
>>> r.keys()
['body']
>>> print(r['body'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>
>>>

Does anybody have any idea of how to print/write this to a file. Yes I know that this is printed to stdout, but I get the same UnicodeEncodeError when I try to write to a file. I tried both write method of a file object and print(r['body'], file=f).

谁知道怎么把这个打印到一个文件上。是的,我知道这是打印到stdout的,但是当我试图写入文件时,我得到了相同的UnicodeEncodeError。我尝试了文件对象和打印的两种写法(r['body'], file=f)。

3 个解决方案

#1


87  

When you open the file you want to write to, open it with a specific encoding that can handle all the characters.

当打开要写入的文件时,打开它,使用特定的编码来处理所有的字符。

with open('filename', 'w', encoding='utf-8') as f:
    print(r['body'], file=f)

#2


80  

Maybe a little late to reply. I happen to run into the same problem today. I find that on Windows you can change the console encoder to utf-8 or other encoder that can represent your data. Then you can print it to sys.stdout.

可能会晚一点回复。我今天碰巧遇到了同样的问题。我发现在Windows上,你可以将控制台编码器换成utf-8或其他可以表示你的数据的编码器。然后你可以把它打印到sys.stdout。

First, run following code in the console:

首先,在控制台中运行以下代码:

chcp 65001
set PYTHONIOENCODING=utf-8

Then, start python do anything you want.

然后,开始python做任何你想做的事情。

#3


28  

While Python 3 deals in Unicode, the Windows console or POSIX tty that you're running inside does not. So, whenever you print, or otherwise send Unicode strings to stdout, and it's attached to a console/tty, Python has to encode it.

当Python 3处理Unicode时,您在内部运行的Windows控制台或POSIX tty不会。因此,无论何时打印,或将Unicode字符串发送到stdout,并将其附加到控制台/tty, Python必须对其进行编码。

The error message indirectly tells you what character set Python was trying to use:

错误消息间接地告诉您Python试图使用的字符集:

  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode

This means the charset is cp850.

这意味着charset是cp850。

You can test or yourself that this charset doesn't have the appropriate character just by doing '\u2013'.encode('cp850'). Or you can look up cp850 online (e.g., at Wikipedia).

您可以通过“\u2013”.encode(“cp850”)来测试或自己测试这个字符集没有合适的字符。或者你可以在网上查一下cp850(比如*)。

It's possible that Python is guessing wrong, and your console is really set for, say UTF-8. (In that case, just manually set sys.stdout.encoding='utf-8'.) It's also possible that you intended your console to be set for UTF-8 but did something wrong. (In that case, you probably want to follow up at superuser.com.)

可能是Python猜测错误,而您的控制台实际上是设置为UTF-8。(在这种情况下,只需手动设置sys.stdout.encoding='utf-8'。)也有可能是你的控制台设置为UTF-8,但做错了。(在这种情况下,你可能想要在superuser.com上跟进。)

But if nothing is wrong, you just can't print that character. You will have to manually encode it with one of the non-strict error-handlers. For example:

但是如果没有什么错误,你就是不能打印那个字符。您必须手动将其编码为非严格错误处理程序之一。例如:

>>> '\u2013'.encode('cp850')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to <undefined>
>>> '\u2013'.encode('cp850', errors='replace')
b'?'

So, how do you print a string that won't print on your console?

那么,如何打印在控制台上不打印的字符串呢?

You can replace every print function with something like this:

你可以用这样的方式来替换每个打印函数:

>>> print(r['body'].encode('cp850', errors='replace').decode('cp850'))
?

… but that's going to get pretty tedious pretty fast.

但这很快就会变得相当乏味。

The simple thing to do is to just set the error handler on sys.stdout:

简单的做法是在sys.stdout上设置错误处理程序:

>>> sys.stdout.errors = 'replace'
>>> print(r['body'])
?

For printing to a file, things are pretty much the same, except that you don't have to set f.errors after the fact, you can set it at construction time. Instead of this:

对于打印到文件,情况基本相同,只是不需要设置f。错误发生后,您可以在构建时设置它。而不是:

with open('path', 'w', encoding='cp850') as f:

Do this:

这样做:

with open('path', 'w', encoding='cp850', errors='replace') as f:

… Or, of course, if you can use UTF-8 files, just do that, as Mark Ransom's answer shows:

或者,当然,如果你可以使用UTF-8文件,就这么做,正如马克·兰塞姆的回答所示:

with open('path', 'w', encoding='utf-8') as f:

#1


87  

When you open the file you want to write to, open it with a specific encoding that can handle all the characters.

当打开要写入的文件时,打开它,使用特定的编码来处理所有的字符。

with open('filename', 'w', encoding='utf-8') as f:
    print(r['body'], file=f)

#2


80  

Maybe a little late to reply. I happen to run into the same problem today. I find that on Windows you can change the console encoder to utf-8 or other encoder that can represent your data. Then you can print it to sys.stdout.

可能会晚一点回复。我今天碰巧遇到了同样的问题。我发现在Windows上,你可以将控制台编码器换成utf-8或其他可以表示你的数据的编码器。然后你可以把它打印到sys.stdout。

First, run following code in the console:

首先,在控制台中运行以下代码:

chcp 65001
set PYTHONIOENCODING=utf-8

Then, start python do anything you want.

然后,开始python做任何你想做的事情。

#3


28  

While Python 3 deals in Unicode, the Windows console or POSIX tty that you're running inside does not. So, whenever you print, or otherwise send Unicode strings to stdout, and it's attached to a console/tty, Python has to encode it.

当Python 3处理Unicode时,您在内部运行的Windows控制台或POSIX tty不会。因此,无论何时打印,或将Unicode字符串发送到stdout,并将其附加到控制台/tty, Python必须对其进行编码。

The error message indirectly tells you what character set Python was trying to use:

错误消息间接地告诉您Python试图使用的字符集:

  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode

This means the charset is cp850.

这意味着charset是cp850。

You can test or yourself that this charset doesn't have the appropriate character just by doing '\u2013'.encode('cp850'). Or you can look up cp850 online (e.g., at Wikipedia).

您可以通过“\u2013”.encode(“cp850”)来测试或自己测试这个字符集没有合适的字符。或者你可以在网上查一下cp850(比如*)。

It's possible that Python is guessing wrong, and your console is really set for, say UTF-8. (In that case, just manually set sys.stdout.encoding='utf-8'.) It's also possible that you intended your console to be set for UTF-8 but did something wrong. (In that case, you probably want to follow up at superuser.com.)

可能是Python猜测错误,而您的控制台实际上是设置为UTF-8。(在这种情况下,只需手动设置sys.stdout.encoding='utf-8'。)也有可能是你的控制台设置为UTF-8,但做错了。(在这种情况下,你可能想要在superuser.com上跟进。)

But if nothing is wrong, you just can't print that character. You will have to manually encode it with one of the non-strict error-handlers. For example:

但是如果没有什么错误,你就是不能打印那个字符。您必须手动将其编码为非严格错误处理程序之一。例如:

>>> '\u2013'.encode('cp850')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to <undefined>
>>> '\u2013'.encode('cp850', errors='replace')
b'?'

So, how do you print a string that won't print on your console?

那么,如何打印在控制台上不打印的字符串呢?

You can replace every print function with something like this:

你可以用这样的方式来替换每个打印函数:

>>> print(r['body'].encode('cp850', errors='replace').decode('cp850'))
?

… but that's going to get pretty tedious pretty fast.

但这很快就会变得相当乏味。

The simple thing to do is to just set the error handler on sys.stdout:

简单的做法是在sys.stdout上设置错误处理程序:

>>> sys.stdout.errors = 'replace'
>>> print(r['body'])
?

For printing to a file, things are pretty much the same, except that you don't have to set f.errors after the fact, you can set it at construction time. Instead of this:

对于打印到文件,情况基本相同,只是不需要设置f。错误发生后,您可以在构建时设置它。而不是:

with open('path', 'w', encoding='cp850') as f:

Do this:

这样做:

with open('path', 'w', encoding='cp850', errors='replace') as f:

… Or, of course, if you can use UTF-8 files, just do that, as Mark Ransom's answer shows:

或者,当然,如果你可以使用UTF-8文件,就这么做,正如马克·兰塞姆的回答所示:

with open('path', 'w', encoding='utf-8') as f: