
时间:2022-09-15 19:15:18

We have a MySQL InnoDB table holding ~10 columns of small base64 encoded javascript files and png (<2KB size) images base64 encoded as well.

我们有一个MySQL InnoDB表,包含~10列小base64编码的javascript文件和png(<2KB大小)图像base64编码。

There are few inserts and a lot of reads comparatively, however the output is being cached on a Memcached instance for some minutes to avoid subsequent reads.


As it is right now we are using BLOB for those columns, but I am wondering if there is an advantage in switching to TEXT datatype in terms of performance or snapshot backing up.


My search digging indicates that BLOB and TEXT for my case are close to identical and since I do not know before-hand what type of data are actually going to be stored I went for BLOB.


Do you have any pointers on the TEXT vs BLOB debate for this specific case?

对于这个具体案例,你对TEXT vs BLOB辩论有什么指示吗?

1 个解决方案



One shouldn't store Base64-encoded data in one's database...

Base64 is a means of representing arbitrary binary data using only printable text characters: it was designed for situations where one needs to transfer such binary data across a protocol or medium that can handle only printable-text (e.g. SMTP/email). It increases the data size (by 33%) and adds the computational cost of encoding/decoding, so it should be avoided unless absolutely necessary.

Base64是一种仅使用可打印文本字符表示任意二进制数据的方法:它是为需要通过只能处理可打印文本(例如SMTP /电子邮件)的协议或介质传输此类二进制数据的情况而设计的。它增加了数据大小(增加了33%)并增加了编码/解码的计算成本,因此除非绝对必要,否则应该避免。

By contrast, the whole point of BLOB columns is that they store raw binary strings. So just go ahead and store your stuff directly into your BLOB columns without first Base64-encoding them. Usually you'll want to store related metadata in other columns, such as file version/last modified date, media type, and (in the case of text files, such as JavaScript sources) character encoding. You might decide to use TEXT type columns for the text files, not only so that MySQL will natively track character encoding for you, but also so that it can transcode to alternative character sets and/or inspect/manipulate the text as may be required (now or in the future).


The (erroneous) idea that SQL databases require printable-text encodings like Base64 for handling arbitrary binary data has been perpetuated by a large number of ill-informed tutorials. This idea appears to be seated in the mistaken belief that, because SQL comprises only printable-text in other contexts, it must surely require it for binary data too (at least for data transfer, if not for data storage). This is simply not true: SQL can convey binary data in a number of ways, including plain string literals (provided that they are properly quoted and escaped like any other string); of course, the preferred way to pass data (of any type) to your database is through parameterised queries, and parameters can just as easily contain binary data as they can anything else.


For what it's worth, I usually altogether avoid storing items like this in the RDBMS and prefer instead to use those highly optimised file storage databases known as filesystems—but that's another matter altogether.

对于它的价值,我通常完全避免在RDBMS中存储这样的项目而宁愿使用那些被称为文件系统的高度优化的文件存储数据库 - 但这完全是另一回事。

...unless it's cached for performance reasons...

The only situation in which there might be some benefit from storing Base64-encoded data is where data is frequently retrieved from the database and transmitted across a protocol that requires that encoding—in which case, storing the Base64-encoded representation would save from having to perform the encoding operation on the otherwise raw data upon every fetch.

存储Base64编码数据可能带来一些好处的唯一情况是数据经常从数据库中检索并通过需要编码的协议传输 - 在这种情况下,存储Base64编码的表示将节省每次获取时对原始数据执行编码操作。

However, note in this sense that the Base64-encoded storage is merely acting as a cache, much like one might store denormalised data for performance reasons.


...in which case it should be TEXT not BLOB

As alluded to above, the difference between TEXT and BLOB really comes down to the fact that TEXT columns are stored together with text-specific metadata (such as character encoding and collation), whereas BLOB columns are not. This additional metadata enables MySQL to transcode characters between storage and connection character sets (where appropriate) and perform fancy character equivalence/ordering.


Generally speaking: if two clients working in different character sets should see the same bytes, then you want a BLOB column; if they should see the same characters then you want a TEXT column.


With Base64, those two clients must ultimately find that the data decodes to the same bytes; but they should see that the encoded data has the same characters. For example, suppose one wishes to insert the Base64-encoding of 'Hello world!' (which is 'SGVsbG8gd29ybGQh'). If the inserting application is working in the UTF-8 character set, then it will send the byte sequence 0x53475673624738676432397962475168 to the database.

使用Base64,这两个客户端必须最终发现数据解码为相同的字节;但他们应该看到编码数据具有相同的字符。例如,假设有人希望插入“Hello world!”的Base64编码。 (即'SGVsbG8gd29ybGQh')。如果插入应用程序在UTF-8字符集中工作,则它将字节序列0x53475673624738676432397962475168发送到数据库。

  • if that byte sequence is stored in a BLOB column and later retrieved by an application that is working in UTF-16, the same bytes will be returned—which represent '升噳扇㡧搲㥹扇全' and not the desired Base64-encoded value; whereas

    如果该字节序列存储在BLOB列中并且稍后由在UTF-16中工作的应用程序检索,则将返回相同的字节 - 其代表“升噳扇㡧搲㥹扇全”而不是所需的Base64编码值;而

  • if that byte sequence is stored in a TEXT column and later retrieved by an application that is working in UTF-16, MySQL will transcode on-the-fly to return the byte sequence 0x0053004700560073006200470038006700640032003900790062004700510068—which represents the original Base64-encoded value 'SGVsbG8gd29ybGQh' as desired.


Of course, you could nevertheless use BLOB columns and track the character encoding in some other way—but that would just needlessly reinvent the wheel, with added maintenance complexity and risk of introducing unintentional errors.

当然,你可以使用BLOB列并以其他方式跟踪字符编码 - 但这会不必要地重新发明*,增加了维护的复杂性和引入无意错误的风险。



