I've got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn't very extensive AFAIK (áéíúóÁÉÍÓÚÑñ)
我有一个数据库,其中包含分散在几个表中的一堆破坏的utf8字符。字符列表不是很广泛AFAIK(áéíúóÁÉÍÓÚÑñ)
Fixing a given table is very straightforward
修复给定的表格非常简单
update orderItem set itemName=replace(itemName,'á','á');
But I can't get a way of detecting the broken characters. If I do something like
但我无法找到一种检测破碎字符的方法。如果我做的事情
SELECT * FROM TABLE WHERE field LIKE "%Ã%";
I get nearly all the fields because of the collation (Ã=a). All broken characters so far start with an "Ã". The database is in spanish so this particular character isn't used
由于整理(Ã= a),我得到了几乎所有的字段。到目前为止,所有破碎的字符都以“Ô开头。数据库是西班牙语,因此不使用此特定字符
The list of broken chars I've got so far is
到目前为止我已经破碎的字符列表是
á = á
é = é
Ã- = í
ó = ó
ñ = ñ
á = Á
Any idea of how to make this SELECT to work as intended? (a binary search or something like that)
知道如何让这个SELECT按预期工作吗? (二进制搜索或类似的东西)
14 个解决方案
#1
6
How about a different approach, namely converting the column back and forth to get the correct character set? You can convert it to binary, then to utf-8 and then to iso-8859-1 or whatever else you're using. See the manual for the details.
如何采用不同的方法,即来回转换列以获得正确的字符集?您可以将其转换为二进制,然后转换为utf-8,然后转换为iso-8859-1或您正在使用的任何其他内容。有关详细信息,请参阅手册。
#2
46
I fixed with
我修好了
UPDATE wp_zcs9ck_posts_copy SET post_title =
CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
完整的解决方案:http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
#3
30
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
#4
11
The SELECT
statement you need is the following:
您需要的SELECT语句如下:
SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);
This returns all rows which contain multi-byte characters.
这将返回包含多字节字符的所有行。
name
is assumed to be a field / the field where weird characters would be found. *
name被假定为一个字段/可以找到奇怪字符的字段。 *
#5
11
No text replacement is a universal solutions because you can forget some character. A more suitable fix for double converted characters is:
没有文本替换是一种通用解决方案,因为您可以忘记一些字符。对双转换字符更合适的修复方法是:
- convert back to latin1
- 转换回latin1
- convert to binary
- 转换为二进制
- convert to utf8
- 转换为utf8
Like this:
喜欢这个:
alter table descriptions modify name VARCHAR(2000) character set latin1;
alter table descriptions modify name blob;
alter table descriptions modify name VARCHAR(2000) character set utf8;
#6
10
Thanks for your answers!!
谢谢你的回答!!
I fixed my tables with this, and wanted to share the full list of changes. Note that it also includes fixing html decoded characters, besides latin ones, it was really a mess:
我用这个修复了我的表,并希望分享完整的更改列表。请注意,它还包括修复html解码的字符,除了拉丁字符,它真的是一团糟:
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'“','"');
update `table` set `field` = replace(`field` ,'â€','"');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
//Edit by slash4
update `table` set `field` = replace(`field` ,'Ã ','À');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'•','-');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'õ','õ');
-- The next one appears to be missing a character. But which one?
update `table` set `field` = replace(`field` ,'Ã','í');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'Ã ','à');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'ˆ','ˆ');
update `table` set `field` = replace(`field` ,'˜','˜');
update `table` set `field` = replace(`field` ,'¨','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'¸','¸');
update `table` set `field` = replace(`field` ,'"','"');
update `table` set `field` = replace(`field` ,'“','“');
update `table` set `field` = replace(`field` ,'”','”');
update `table` set `field` = replace(`field` ,'‘','‘');
update `table` set `field` = replace(`field` ,'’','’');
update `table` set `field` = replace(`field` ,'‹','‹');
update `table` set `field` = replace(`field` ,'›','›');
update `table` set `field` = replace(`field` ,'«','«');
update `table` set `field` = replace(`field` ,'»','»');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'—','—');
update `table` set `field` = replace(`field` ,'¯','¯');
update `table` set `field` = replace(`field` ,'…','…');
update `table` set `field` = replace(`field` ,'¦','¦');
update `table` set `field` = replace(`field` ,'•','•');
update `table` set `field` = replace(`field` ,'¶','¶');
update `table` set `field` = replace(`field` ,'§','§');
update `table` set `field` = replace(`field` ,'¹','¹');
update `table` set `field` = replace(`field` ,'²','²');
update `table` set `field` = replace(`field` ,'³','³');
update `table` set `field` = replace(`field` ,'½','½');
update `table` set `field` = replace(`field` ,'¼','¼');
update `table` set `field` = replace(`field` ,'¾','¾');
update `table` set `field` = replace(`field` ,'⅛','⅛');
update `table` set `field` = replace(`field` ,'⅜','⅜');
update `table` set `field` = replace(`field` ,'⅝','⅝');
update `table` set `field` = replace(`field` ,'⅞','⅞');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'±','±');
update `table` set `field` = replace(`field` ,'−','−');
update `table` set `field` = replace(`field` ,'×','×');
update `table` set `field` = replace(`field` ,'÷','÷');
update `table` set `field` = replace(`field` ,'∗','∗');
update `table` set `field` = replace(`field` ,'⁄','⁄');
update `table` set `field` = replace(`field` ,'‰','‰');
update `table` set `field` = replace(`field` ,'∫','∫');
update `table` set `field` = replace(`field` ,'∑','∑');
update `table` set `field` = replace(`field` ,'∏','∏');
update `table` set `field` = replace(`field` ,'√','√');
update `table` set `field` = replace(`field` ,'∞','∞');
update `table` set `field` = replace(`field` ,'≈','≈');
update `table` set `field` = replace(`field` ,'≅','≅');
update `table` set `field` = replace(`field` ,'∝','∝');
update `table` set `field` = replace(`field` ,'≡','≡');
update `table` set `field` = replace(`field` ,'≠','≠');
update `table` set `field` = replace(`field` ,'≤','≤');
update `table` set `field` = replace(`field` ,'≥','≥');
update `table` set `field` = replace(`field` ,'∴','∴');
update `table` set `field` = replace(`field` ,'⋅','⋅');
update `table` set `field` = replace(`field` ,'·','·');
update `table` set `field` = replace(`field` ,'∂','∂');
update `table` set `field` = replace(`field` ,'ℑ','ℑ');
update `table` set `field` = replace(`field` ,'ℜ','ℜ');
update `table` set `field` = replace(`field` ,'′','′');
update `table` set `field` = replace(`field` ,'″','″');
update `table` set `field` = replace(`field` ,'°','°');
update `table` set `field` = replace(`field` ,'∠','∠');
update `table` set `field` = replace(`field` ,'⊥','⊥');
update `table` set `field` = replace(`field` ,'∇','∇');
update `table` set `field` = replace(`field` ,'⊕','⊕');
update `table` set `field` = replace(`field` ,'⊗','⊗');
update `table` set `field` = replace(`field` ,'ℵ','ℵ');
update `table` set `field` = replace(`field` ,'ø','ø');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'∈','∈');
update `table` set `field` = replace(`field` ,'∉','∉');
update `table` set `field` = replace(`field` ,'∩','∩');
update `table` set `field` = replace(`field` ,'∪','∪');
update `table` set `field` = replace(`field` ,'⊂','⊂');
update `table` set `field` = replace(`field` ,'⊃','⊃');
update `table` set `field` = replace(`field` ,'⊆','⊆');
update `table` set `field` = replace(`field` ,'⊇','⊇');
update `table` set `field` = replace(`field` ,'∃','∃');
update `table` set `field` = replace(`field` ,'∀','∀');
update `table` set `field` = replace(`field` ,'∅','∅');
update `table` set `field` = replace(`field` ,'¬','¬');
update `table` set `field` = replace(`field` ,'∧','∧');
update `table` set `field` = replace(`field` ,'∨','∨');
update `table` set `field` = replace(`field` ,'↵','↵');
#7
10
This saved my life
这救了我的命
UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)
I've found it here http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
我在这里找到了http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
#8
2
I had this same problem but didn't like the replace() solution because there's always the possibility of missing some characters. I was working against a column with mixed data (some had been utf8_encode()d and some not) with 4 million or so rows, about 250k records with mis-encoded data (with É/etc characters), covering about 15 international languages, including mainly European languages but also Russian, Japanese and Chinese.
我有同样的问题,但不喜欢replace()解决方案,因为总有可能错过一些字符。我正在反对一个混合数据的列(一些是utf8_encode()d而另一些没有),有400万行左右,大约250k记录,错误编码数据(带有‰/ etc字符),涵盖约15种国际语言,主要包括欧洲语言,还包括俄语,日语和中文。
I started by duplicating the column, since I didn't want to lose any data:
我开始复制该列,因为我不想丢失任何数据:
ALTER TABLE images ADD COLUMN reptitle TEXT;
Copied all the data with multibyte characters (thanks Adam for the tip)
使用多字节字符复制所有数据(感谢Adam提示)
UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)
Since reptitle was created with the table's default character set it was already utf8, but contained the corrupted data since images table used to be a latin source. Column reptitle now contains some data which is correctly encoded, and some corrupted (all values with multibyte characters, some had been correctly utf8_encode()d. So then with David's tip...
由于reptitle是使用表的默认字符集创建的,因此它已经是utf8,但包含损坏的数据,因为图像表曾经是拉丁语源。列reptitle现在包含一些正确编码的数据,有些已损坏(所有值都有多字节字符,有些已经正确utf8_encode()d。那么请大卫提示......
ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;
The middle step may not have been necessary since TEXT and BLOB (I think) are the same. This had the effect of correcting all incorrectly encoded data ('étudiantes' became 'étudiantes', etc) but data which was previously correct was truncated at the first multibyte character ('Lapin de Pâques' became 'Lapin de P'). I don't know why the truncation, but it's in a disposable column so I didn't care. The truncated data gives CHAR_LENGTH and LENGTH of the same values because there are no multi-byte characters remaining so easy query...
由于TEXT和BLOB(我认为)是相同的,因此可能没有必要使用中间步骤。这具有纠正所有错误编码数据的效果('tudiantes'成为'étudiantes'等),但先前正确的数据在第一个多字节字符处被截断('LapindePâques'成为'Lapin de P')。我不知道截断的原因,但它是在一次性色谱柱中所以我并不在意。截断的数据给出了相同值的CHAR_LENGTH和LENGTH,因为没有多字节字符,所以很容易查询...
UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)
Then of course just drop the spare column
然后当然只需放下备用柱
ALTER TABLE images DROP COLUMN reptitle
Also make sure (since I use PHP and this had tripped me up a couple of times so I thought I'd mention it here) all your script files are UTF8 (without BOM) and you are using:
还要确保(因为我使用PHP,这让我绊倒了几次,所以我想我在这里提到它)所有的脚本文件都是UTF8(没有BOM),你正在使用:
mysql_set_charset('utf8', $connection);
Et voilà... perfectly repaired data, all languages :)
Etvoilà...完美修复的数据,所有语言:)
#9
2
In addition to Raúl Avila Solano and acseven's answer if you want to update all the broken characters in one query you can do:
除了RaúlAvilaSolano和acseven的回答,如果你想在一个查询中更新所有破碎的字符,你可以这样做:
update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'ü','ü'),'ô','ô'),'ó','ó'),'ê','ê'),'à','à'),'ã','ã'),'Ü','Ü'),'Ô','Ô'),'Ó','Ó'),'Ê','Ê'),'À','À'),'Ã','Ã'),'Ç','Ç'),'Ú','Ú'),'Õ','Õ'),'Í','Í'),'Í','Í'),'É','É'),'Â','Â'),'Á','Á'),'ç','ç'),'ú','ú'),'õ','õ'),'í','í'),'é','é'),'â','â'),'á','á'),'ã','ã'),'ç','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'Ã','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
#10
2
this also solved my problem on some italian chars
这也解决了我在一些意大利人的问题上的问题
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'€','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');
#11
1
You might have rows with properly encoded UTF8 and with wrongly encoded characters. In this case "CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8)" will trim some fields.
您可能拥有正确编码的UTF8和错误编码字符的行。在这种情况下,“CONVERT(BINARY CONVERT(post_title USING latin1)USING utf8)”将修剪一些字段。
I ended up doing it this way
我最终这样做了
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');
#12
0
The middle step may not have been necessary since TEXT
and BLOB
are the same.
由于TEXT和BLOB是相同的,因此可能没有必要使用中间步骤。
This had the effect of correcting all incorrectly encoded data but data which was previously correct was truncated at the first multibyte character.
这具有纠正所有错误编码数据的效果,但先前正确的数据在第一个多字节字符处被截断。
#13
0
There's a nice script to automate the conversion process in a whole database. It's also useful to know that MySQL's UTF-8 implementation is incomplete since it only supports UTF-8 chars up to 3 bytes. The solution is to use the utf8mb4 charset introduced in MySQL 5.5.3.
有一个很好的脚本可以在整个数据库中自动执行转换过程。知道MySQL的UTF-8实现是不完整的也很有用,因为它只支持最多3个字节的UTF-8字符。解决方案是使用MySQL 5.5.3中引入的utf8mb4字符集。
#14
0
This is an extension of @Thales Ceolin's answer in order to modify every table in the db:
这是@Thales Ceolin的答案的扩展,以便修改db中的每个表:
select concat(
"update ",
a.TABLE_NAME,
" set ", b.COLUMN_NAME,
" = CONVERT(BINARY CONVERT(",
b.COLUMN_NAME,
" USING latin1) USING utf8) where ",
b.COLUMN_NAME,
" is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';
This will result in:
这将导致:
update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;
#1
6
How about a different approach, namely converting the column back and forth to get the correct character set? You can convert it to binary, then to utf-8 and then to iso-8859-1 or whatever else you're using. See the manual for the details.
如何采用不同的方法,即来回转换列以获得正确的字符集?您可以将其转换为二进制,然后转换为utf-8,然后转换为iso-8859-1或您正在使用的任何其他内容。有关详细信息,请参阅手册。
#2
46
I fixed with
我修好了
UPDATE wp_zcs9ck_posts_copy SET post_title =
CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
完整的解决方案:http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
#3
30
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
#4
11
The SELECT
statement you need is the following:
您需要的SELECT语句如下:
SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);
This returns all rows which contain multi-byte characters.
这将返回包含多字节字符的所有行。
name
is assumed to be a field / the field where weird characters would be found. *
name被假定为一个字段/可以找到奇怪字符的字段。 *
#5
11
No text replacement is a universal solutions because you can forget some character. A more suitable fix for double converted characters is:
没有文本替换是一种通用解决方案,因为您可以忘记一些字符。对双转换字符更合适的修复方法是:
- convert back to latin1
- 转换回latin1
- convert to binary
- 转换为二进制
- convert to utf8
- 转换为utf8
Like this:
喜欢这个:
alter table descriptions modify name VARCHAR(2000) character set latin1;
alter table descriptions modify name blob;
alter table descriptions modify name VARCHAR(2000) character set utf8;
#6
10
Thanks for your answers!!
谢谢你的回答!!
I fixed my tables with this, and wanted to share the full list of changes. Note that it also includes fixing html decoded characters, besides latin ones, it was really a mess:
我用这个修复了我的表,并希望分享完整的更改列表。请注意,它还包括修复html解码的字符,除了拉丁字符,它真的是一团糟:
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'“','"');
update `table` set `field` = replace(`field` ,'â€','"');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
//Edit by slash4
update `table` set `field` = replace(`field` ,'Ã ','À');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'•','-');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'õ','õ');
-- The next one appears to be missing a character. But which one?
update `table` set `field` = replace(`field` ,'Ã','í');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'Ã ','à');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'ˆ','ˆ');
update `table` set `field` = replace(`field` ,'˜','˜');
update `table` set `field` = replace(`field` ,'¨','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'¸','¸');
update `table` set `field` = replace(`field` ,'"','"');
update `table` set `field` = replace(`field` ,'“','“');
update `table` set `field` = replace(`field` ,'”','”');
update `table` set `field` = replace(`field` ,'‘','‘');
update `table` set `field` = replace(`field` ,'’','’');
update `table` set `field` = replace(`field` ,'‹','‹');
update `table` set `field` = replace(`field` ,'›','›');
update `table` set `field` = replace(`field` ,'«','«');
update `table` set `field` = replace(`field` ,'»','»');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'—','—');
update `table` set `field` = replace(`field` ,'¯','¯');
update `table` set `field` = replace(`field` ,'…','…');
update `table` set `field` = replace(`field` ,'¦','¦');
update `table` set `field` = replace(`field` ,'•','•');
update `table` set `field` = replace(`field` ,'¶','¶');
update `table` set `field` = replace(`field` ,'§','§');
update `table` set `field` = replace(`field` ,'¹','¹');
update `table` set `field` = replace(`field` ,'²','²');
update `table` set `field` = replace(`field` ,'³','³');
update `table` set `field` = replace(`field` ,'½','½');
update `table` set `field` = replace(`field` ,'¼','¼');
update `table` set `field` = replace(`field` ,'¾','¾');
update `table` set `field` = replace(`field` ,'⅛','⅛');
update `table` set `field` = replace(`field` ,'⅜','⅜');
update `table` set `field` = replace(`field` ,'⅝','⅝');
update `table` set `field` = replace(`field` ,'⅞','⅞');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'±','±');
update `table` set `field` = replace(`field` ,'−','−');
update `table` set `field` = replace(`field` ,'×','×');
update `table` set `field` = replace(`field` ,'÷','÷');
update `table` set `field` = replace(`field` ,'∗','∗');
update `table` set `field` = replace(`field` ,'⁄','⁄');
update `table` set `field` = replace(`field` ,'‰','‰');
update `table` set `field` = replace(`field` ,'∫','∫');
update `table` set `field` = replace(`field` ,'∑','∑');
update `table` set `field` = replace(`field` ,'∏','∏');
update `table` set `field` = replace(`field` ,'√','√');
update `table` set `field` = replace(`field` ,'∞','∞');
update `table` set `field` = replace(`field` ,'≈','≈');
update `table` set `field` = replace(`field` ,'≅','≅');
update `table` set `field` = replace(`field` ,'∝','∝');
update `table` set `field` = replace(`field` ,'≡','≡');
update `table` set `field` = replace(`field` ,'≠','≠');
update `table` set `field` = replace(`field` ,'≤','≤');
update `table` set `field` = replace(`field` ,'≥','≥');
update `table` set `field` = replace(`field` ,'∴','∴');
update `table` set `field` = replace(`field` ,'⋅','⋅');
update `table` set `field` = replace(`field` ,'·','·');
update `table` set `field` = replace(`field` ,'∂','∂');
update `table` set `field` = replace(`field` ,'ℑ','ℑ');
update `table` set `field` = replace(`field` ,'ℜ','ℜ');
update `table` set `field` = replace(`field` ,'′','′');
update `table` set `field` = replace(`field` ,'″','″');
update `table` set `field` = replace(`field` ,'°','°');
update `table` set `field` = replace(`field` ,'∠','∠');
update `table` set `field` = replace(`field` ,'⊥','⊥');
update `table` set `field` = replace(`field` ,'∇','∇');
update `table` set `field` = replace(`field` ,'⊕','⊕');
update `table` set `field` = replace(`field` ,'⊗','⊗');
update `table` set `field` = replace(`field` ,'ℵ','ℵ');
update `table` set `field` = replace(`field` ,'ø','ø');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'∈','∈');
update `table` set `field` = replace(`field` ,'∉','∉');
update `table` set `field` = replace(`field` ,'∩','∩');
update `table` set `field` = replace(`field` ,'∪','∪');
update `table` set `field` = replace(`field` ,'⊂','⊂');
update `table` set `field` = replace(`field` ,'⊃','⊃');
update `table` set `field` = replace(`field` ,'⊆','⊆');
update `table` set `field` = replace(`field` ,'⊇','⊇');
update `table` set `field` = replace(`field` ,'∃','∃');
update `table` set `field` = replace(`field` ,'∀','∀');
update `table` set `field` = replace(`field` ,'∅','∅');
update `table` set `field` = replace(`field` ,'¬','¬');
update `table` set `field` = replace(`field` ,'∧','∧');
update `table` set `field` = replace(`field` ,'∨','∨');
update `table` set `field` = replace(`field` ,'↵','↵');
#7
10
This saved my life
这救了我的命
UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)
I've found it here http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
我在这里找到了http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
#8
2
I had this same problem but didn't like the replace() solution because there's always the possibility of missing some characters. I was working against a column with mixed data (some had been utf8_encode()d and some not) with 4 million or so rows, about 250k records with mis-encoded data (with É/etc characters), covering about 15 international languages, including mainly European languages but also Russian, Japanese and Chinese.
我有同样的问题,但不喜欢replace()解决方案,因为总有可能错过一些字符。我正在反对一个混合数据的列(一些是utf8_encode()d而另一些没有),有400万行左右,大约250k记录,错误编码数据(带有‰/ etc字符),涵盖约15种国际语言,主要包括欧洲语言,还包括俄语,日语和中文。
I started by duplicating the column, since I didn't want to lose any data:
我开始复制该列,因为我不想丢失任何数据:
ALTER TABLE images ADD COLUMN reptitle TEXT;
Copied all the data with multibyte characters (thanks Adam for the tip)
使用多字节字符复制所有数据(感谢Adam提示)
UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)
Since reptitle was created with the table's default character set it was already utf8, but contained the corrupted data since images table used to be a latin source. Column reptitle now contains some data which is correctly encoded, and some corrupted (all values with multibyte characters, some had been correctly utf8_encode()d. So then with David's tip...
由于reptitle是使用表的默认字符集创建的,因此它已经是utf8,但包含损坏的数据,因为图像表曾经是拉丁语源。列reptitle现在包含一些正确编码的数据,有些已损坏(所有值都有多字节字符,有些已经正确utf8_encode()d。那么请大卫提示......
ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;
The middle step may not have been necessary since TEXT and BLOB (I think) are the same. This had the effect of correcting all incorrectly encoded data ('étudiantes' became 'étudiantes', etc) but data which was previously correct was truncated at the first multibyte character ('Lapin de Pâques' became 'Lapin de P'). I don't know why the truncation, but it's in a disposable column so I didn't care. The truncated data gives CHAR_LENGTH and LENGTH of the same values because there are no multi-byte characters remaining so easy query...
由于TEXT和BLOB(我认为)是相同的,因此可能没有必要使用中间步骤。这具有纠正所有错误编码数据的效果('tudiantes'成为'étudiantes'等),但先前正确的数据在第一个多字节字符处被截断('LapindePâques'成为'Lapin de P')。我不知道截断的原因,但它是在一次性色谱柱中所以我并不在意。截断的数据给出了相同值的CHAR_LENGTH和LENGTH,因为没有多字节字符,所以很容易查询...
UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)
Then of course just drop the spare column
然后当然只需放下备用柱
ALTER TABLE images DROP COLUMN reptitle
Also make sure (since I use PHP and this had tripped me up a couple of times so I thought I'd mention it here) all your script files are UTF8 (without BOM) and you are using:
还要确保(因为我使用PHP,这让我绊倒了几次,所以我想我在这里提到它)所有的脚本文件都是UTF8(没有BOM),你正在使用:
mysql_set_charset('utf8', $connection);
Et voilà... perfectly repaired data, all languages :)
Etvoilà...完美修复的数据,所有语言:)
#9
2
In addition to Raúl Avila Solano and acseven's answer if you want to update all the broken characters in one query you can do:
除了RaúlAvilaSolano和acseven的回答,如果你想在一个查询中更新所有破碎的字符,你可以这样做:
update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'ü','ü'),'ô','ô'),'ó','ó'),'ê','ê'),'à','à'),'ã','ã'),'Ü','Ü'),'Ô','Ô'),'Ó','Ó'),'Ê','Ê'),'À','À'),'Ã','Ã'),'Ç','Ç'),'Ú','Ú'),'Õ','Õ'),'Í','Í'),'Í','Í'),'É','É'),'Â','Â'),'Á','Á'),'ç','ç'),'ú','ú'),'õ','õ'),'í','í'),'é','é'),'â','â'),'á','á'),'ã','ã'),'ç','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'Ã','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
#10
2
this also solved my problem on some italian chars
这也解决了我在一些意大利人的问题上的问题
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'€','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');
#11
1
You might have rows with properly encoded UTF8 and with wrongly encoded characters. In this case "CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8)" will trim some fields.
您可能拥有正确编码的UTF8和错误编码字符的行。在这种情况下,“CONVERT(BINARY CONVERT(post_title USING latin1)USING utf8)”将修剪一些字段。
I ended up doing it this way
我最终这样做了
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');
#12
0
The middle step may not have been necessary since TEXT
and BLOB
are the same.
由于TEXT和BLOB是相同的,因此可能没有必要使用中间步骤。
This had the effect of correcting all incorrectly encoded data but data which was previously correct was truncated at the first multibyte character.
这具有纠正所有错误编码数据的效果,但先前正确的数据在第一个多字节字符处被截断。
#13
0
There's a nice script to automate the conversion process in a whole database. It's also useful to know that MySQL's UTF-8 implementation is incomplete since it only supports UTF-8 chars up to 3 bytes. The solution is to use the utf8mb4 charset introduced in MySQL 5.5.3.
有一个很好的脚本可以在整个数据库中自动执行转换过程。知道MySQL的UTF-8实现是不完整的也很有用,因为它只支持最多3个字节的UTF-8字符。解决方案是使用MySQL 5.5.3中引入的utf8mb4字符集。
#14
0
This is an extension of @Thales Ceolin's answer in order to modify every table in the db:
这是@Thales Ceolin的答案的扩展,以便修改db中的每个表:
select concat(
"update ",
a.TABLE_NAME,
" set ", b.COLUMN_NAME,
" = CONVERT(BINARY CONVERT(",
b.COLUMN_NAME,
" USING latin1) USING utf8) where ",
b.COLUMN_NAME,
" is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';
This will result in:
这将导致:
update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;