如何判断表中的数据是否有不正确的编码?

时间:2021-03-10 07:13:59

I have a couple tables that are set to the latin1 character set but I suspect have been erroneously been inserted with some values that are actually encoded using utf8.

我有几个表设置为latin1字符集,但我怀疑错误插入了一些实际使用utf8编码的值。

MySQL makes this a little more complicated because it silently converts everything based on your connection settings.

MySQL使这变得更复杂,因为它根据您的连接设置静默转换所有内容。

How can I test my hypothesis that there are some utf8-encoded bytes in a latin1 column in MySQL?

我如何测试我的假设,即MySQL中的latin1列中有一些utf8编码的字节?

1 个解决方案

#1


If you find strings of 2 bytes which match the following bit pattern:

如果找到与以下位模式匹配的2个字节的字符串:

110xxxxx 10xxxxxx

chances are that these are utf-8 characters. It is possible that they are 2 consecutive non-ascii latin-1 characters (like 'Ä' or something unprintable), but that is unlikely.

很有可能这些都是utf-8字符。它们可能是2个连续的非ascii latin-1字符(如'Ä'或某些不可打印的字符),但这不太可能。

#1


If you find strings of 2 bytes which match the following bit pattern:

如果找到与以下位模式匹配的2个字节的字符串:

110xxxxx 10xxxxxx

chances are that these are utf-8 characters. It is possible that they are 2 consecutive non-ascii latin-1 characters (like 'Ä' or something unprintable), but that is unlikely.

很有可能这些都是utf-8字符。它们可能是2个连续的非ascii latin-1字符(如'Ä'或某些不可打印的字符),但这不太可能。