MySQL char和varchar字符集和存储大小

时间:2022-03-06 16:59:10

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.

想知道这两个数据类型会占用多少实际的存储空间,因为MySQL文档对此有点不清楚。

CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set

CHAR(M)M×w字节,0 < = M < = 255,w是最大长度所需的字节数字符的字符集

VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes

VARCHAR(M), VARBINARY(M) L + 1 bytes如果列值需要0 - 255个字节,如果值需要超过255个字节,则为L + 2个字节

This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.

在我看来,这似乎意味着,给定一个utf8编码的数据库,每个字符的字符都将占用32位,而VARCHAR将占用8到32位,这取决于存储的字符的实际字节长度。那是正确的吗?或者,VARCHAR是否意味着一个8位字符宽度,并存储多八位UTF8字符实际上会消耗VARCHAR中的多个“字符”?或者VARCHAR也总是每个字符存储32位?如此多的可能性。

Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).

这不是我以前需要担心的事情,但是我开始触及内存中的临时表大小限制,我也不需要增加MySQL的可用池(第二次)。

1 个解决方案

#1


9  

CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is crippled for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.

CHAR和VARCHAR都计算字符。它们都计算给定字符编码和长度所需的最大存储空间。对于ASCII,每个字符是1字节。对于UTF-8,每个字符需要3个字节(而不是您所期望的4个字节,因为MySQL的Unicode支持由于某些原因而瘫痪,而且它不支持任何Unicode字符,这些字符在UTF-8中需要4个字节)。到目前为止,CHAR和VARCHAR是相同的。

Now, CHAR just goes ahead and reserves this amount of storage.

现在,CHAR就保留了这么多的存储空间。

VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.

VARCHAR分配1或2字节,而是取决于这个最大存储是< 256或≥256。条目实际占用的空间是这一两个字节,加上字符串实际占用的空间。

Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:

有趣的是,这使85成为UTF-8 VARCHAR的神奇数字:

  • VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 (crippled) UTF-8 characters is 3 × 85 = 255.
  • VARCHAR(85)使用1个字节的长度,因为最大可能85(瘫痪)utf - 8字符3×85 = 255。
  • VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 (crippled) UTF-8 characters is 3 × 86 = 258.
  • VARCHAR(86)使用2字节的长度,因为86年的最大可能的长度(瘫痪)utf - 8字符是3×86 = 258。

#1


9  

CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is crippled for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.

CHAR和VARCHAR都计算字符。它们都计算给定字符编码和长度所需的最大存储空间。对于ASCII,每个字符是1字节。对于UTF-8,每个字符需要3个字节(而不是您所期望的4个字节,因为MySQL的Unicode支持由于某些原因而瘫痪,而且它不支持任何Unicode字符,这些字符在UTF-8中需要4个字节)。到目前为止,CHAR和VARCHAR是相同的。

Now, CHAR just goes ahead and reserves this amount of storage.

现在,CHAR就保留了这么多的存储空间。

VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.

VARCHAR分配1或2字节,而是取决于这个最大存储是< 256或≥256。条目实际占用的空间是这一两个字节,加上字符串实际占用的空间。

Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:

有趣的是,这使85成为UTF-8 VARCHAR的神奇数字:

  • VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 (crippled) UTF-8 characters is 3 × 85 = 255.
  • VARCHAR(85)使用1个字节的长度,因为最大可能85(瘫痪)utf - 8字符3×85 = 255。
  • VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 (crippled) UTF-8 characters is 3 × 86 = 258.
  • VARCHAR(86)使用2字节的长度,因为86年的最大可能的长度(瘫痪)utf - 8字符是3×86 = 258。