What column type should be used to store serialized data in a mysql db? I know you can use varbinary, blob, text. What's considered the best and why?
应该使用什么列类型来存储mysql db中的序列化数据?我知道你可以使用varbinary, blob, text。什么被认为是最好的,为什么?
Edit: I understand it is not "good" to store serialized data. I need to do it in this one case though. Please just trust me on this and focus on the question if you have an answer. Thanks!
编辑:我理解存储序列化数据不是“好”的。在这个例子中我需要这么做。请相信我,如果你有答案,请专注于这个问题。谢谢!
8 个解决方案
#1
58
To answer: text is deprecated in a lot of DBMS it seems, so better use either a blob or a varchar with a high limit (and with blob you won't get any encoding issues, which is a major hassle with varchar and text).
答案是:在很多DBMS中,文本被弃用了,所以最好使用一个blob或varchar(对于blob,您不会遇到任何编码问题,这对于varchar和text来说是一个很大的麻烦)。
Also as pointed in this thread at the MySQL forums, hard-drives are cheaper than software, so you'd better first design your software and make it work, and only then if space becomes an issue, you may want to optimize that aspect. So don't try to overoptimize the size of your column too early on, better set the size larger at first (plus this will avoid security issues).
同样,正如在MySQL论坛的这篇文章中指出的,硬盘比软件便宜,所以你最好先设计你的软件并让它工作,只有当空间成为问题时,你才可能想要优化这个方面。因此,不要过早地对列的大小进行过度优化,最好首先将其设置为更大的大小(另外,这将避免安全问题)。
About the various comments: Too much SQL fanaticism here. Despite the fact that I am greatly fond of SQL and relational models, they also have their pitfalls.
关于各种评论:这里有太多的SQL*。尽管我非常喜欢SQL和关系模型,但它们也有缺陷。
Storing serialized data into the database as-is (such as storing JSON or XML formatted data) has a few advantages:
将序列化数据存储到数据库中(例如存储JSON或XML格式的数据)有一些优势:
- You can have a more flexible format for your data: adding and removing fields on the fly, changing the specification of the fields on the fly, etc...
- 您可以为您的数据提供更灵活的格式:动态地添加和删除字段,动态地更改字段的规范,等等……
- Less impedance mismatch with the object model: you store and you fetch the data just as it is in your program, compared to fetching the data and then having to process and convert it between your program objects' structures and your relational database's structures.
- 减少与对象模型的阻抗不匹配:与获取数据相比,您可以存储和获取程序中的数据,然后必须在程序对象的结构和关系数据库的结构之间进行处理和转换。
And there are a lot more other advantages, so please no fanboyism: relational databases are a great tool, but let's not dish the other tools we can get. More tools, the better.
而且还有很多其他的优势,所以请不要盲目跟风:关系数据库是一个很好的工具,但是我们不要再使用我们可以获得的其他工具了。工具,越多越好。
As for a concrete example of use, I tend to add a JSON field in my database to store extra parameters of a record where the columns (properties) of the JSON data will never be SELECT'd individually, but only used when the right record is already selected. In this case, I can still discriminate my records with the relational columns, and when the right record is selected, I can just use the extra parameters for whatever purpose I want.
至于具体的使用示例,我倾向于在数据库中添加一个JSON字段来存储一个记录的额外参数,其中永远不会单独选择JSON数据的列(属性),但只在已经选择正确的记录时使用。在这种情况下,我仍然可以用关系列来区分我的记录,当选择正确的记录时,我可以使用额外的参数来实现我想要的任何目的。
So my advice to retain the best of both world (speed, serializability and structural flexibility), just use a few standard relational columns to serve as unique keys to discriminate between your rows, and then use a blob/varchar column where your serialized data will be inserted. Usually, only two/three columns are required for a unique key, thus this won't be a major overhead.
因此,我的建议是,保持两方面的优势(速度、可串行性和结构灵活性),只需使用一些标准的关系列作为惟一的键来区分行,然后使用blob/varchar列插入序列化数据。通常,一个唯一的键只需要2 / 3列,因此这不会是很大的开销。
Also, you may be interested by PostgreSQL which now has a JSON datatype, and the PostSQL project to directly process JSON fields just as relational columns.
另外,您可能对PostgreSQL感兴趣,它现在有一个JSON数据类型,而PostSQL项目直接处理JSON字段,就像关系列一样。
#2
10
How much do you plan to store? Check out the specs for the string types at the MySQL docs and their sizes. The key here is that you don't care about indexing this column, but you also never want it to overflow and get truncated, since then you JSON is unreadable.
你打算存多少钱?查看MySQL文档中字符串类型的规格及其大小。这里的关键是您不关心对这个列进行索引,但是您也不希望它溢出和被截断,因为您JSON是不可读的。
- TINYTEXT L < 2^8
- 非常小的文本串L < 2 ^ 8
- TEXT L < 2^16
- 文本L < 2 ^ 16
- MEDIUMTEXT L < 2^24
- 简单L < 2 ^ 24
- LONGTEXT L < 2^32
- 量变L < 2 ^ 32
Where L is the length in character
L是字符的长度
Just plain text should be enough, but go bigger if you are storing more. Though, in that case, you might not want to be storing it in the db.
纯文本就足够了,但是如果你要存储更多的话,就需要更大的文本。但是,在这种情况下,您可能不希望将它存储在db中。
#3
10
The length limits that @Twisted Pear mentions are good reasons.
@Twisted Pear提到的长度限制是很好的理由。
Also consider that TEXT
and its ilk have a charset associated with them, whereas BLOB
data types do not. If you're just storing raw bytes of data, you might as well use BLOB
instead of TEXT
.
还要考虑文本及其同类具有与它们相关联的字符集,而BLOB数据类型则没有。如果只是存储原始数据字节,那么最好使用BLOB而不是文本。
Note that you can still store textual data in a BLOB
, you just can't do any SQL operations on it that take charset into account; it's just bytes to SQL. But that's probably not an issue in your case, since it's serialized data with structure unknown to SQL anyway. All you need to do is store bytes and fetch bytes. The interpretation of the bytes is up to your app.
注意,您仍然可以在BLOB中存储文本数据,只是不能对它执行任何考虑到charset的SQL操作;它只是SQL的字节。但在您的例子中,这可能不是一个问题,因为它是序列化的数据,其结构对于SQL来说是未知的。您需要做的就是存储字节和获取字节。字节的解释取决于你的应用程序。
I have also had troubles using LONGBLOB
or LONGTEXT
using certain client libraries (e.g. PHP) because the client tries to allocate a buffer as large as the largest possible data type, not knowing how large the content will be on any given row until it's fetched. This caused PHP to burst into flames as it tried to allocate a 4GB buffer. I don't know what client you're using, or whether it suffers from the same behavior.
使用某些客户端库(例如PHP)使用LONGBLOB或LONGTEXT也有问题,因为客户端试图分配一个尽可能大的数据类型的缓冲区,在获取内容之前,不知道任何一行上的内容有多大。这导致PHP在试图分配4GB缓冲区时突然起火。我不知道你在用哪个客户,或者它是否有同样的行为。
The workaround: use MEDIUMBLOB
or just BLOB
, as long as those types are sufficient to store your serialized data.
解决方案:使用MEDIUMBLOB或BLOB,只要这些类型足以存储序列化数据。
On the issue of people telling you not to do this, I'm not going to tell you that (in spite of the fact that I'm an SQL advocate). It's true you can't use SQL expressions to perform operations on individual elements within the serialized data, but that's not your purpose. What you do gain by putting that data into the database includes:
关于人们告诉你不要这样做的问题,我不会告诉你(尽管我是SQL倡导者)。确实,您不能使用SQL表达式对序列化数据中的单个元素执行操作,但这不是您的目的。将数据放入数据库所获得的好处包括:
- Associate serialized data with other more relational data.
- 将序列化数据与其他更多关系数据关联起来。
- Ability to store and fetch serialized data according to transaction scope, COMMIT, ROLLBACK.
- 能够根据事务范围、提交、回滚来存储和获取序列化数据。
- Store all your relational and non-relational data in one place, to make it easier to replicate to slaves, back up and restore, etc.
- 将所有关系和非关系数据存储在一个地方,以便更容易地复制到奴隶、备份和恢复等等。
#4
7
LONGTEXT
量变
Wordpress stores serialized data in their postmeta table as LONGTEXT. I find the Wordpress database to be a good place to research datatypes for columns.
Wordpress将序列化数据存储在他们的postmeta表中作为LONGTEXT。我发现Wordpress数据库是研究列数据类型的好地方。
#5
2
As of MySQL 5.7.8, MySQL supports a native JSON data type: MySQL Manual
从MySQL 5.7.8开始,MySQL支持本地JSON数据类型:MySQL手册
#6
1
I might be late to the party, but the php.net documentation about serialized object states the following:
我可能会迟到,但是关于序列化对象的php.net文档说明如下:
Note that this is a binary string which may include null bytes, and needs to be stored and handled as such. For example, serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field.
注意,这是一个二进制字符串,可能包含空字节,需要存储和处理。例如,serialize()输出通常应该存储在数据库中的BLOB字段中,而不是CHAR或TEXT字段。
Source: http://php.net/manual/en/function.serialize.php
来源:http://php.net/manual/en/function.serialize.php
Hope that helps!
希望会有帮助!
#7
-1
Unless the serialized data has no other use than to be saved and restored from the database, you probably don't want to do it that way.
除非序列化数据除了要从数据库中保存和恢复之外没有其他用途,否则您可能不希望这样做。
Typically, serialized data has several fields which should be stored in the database as separate columns. It is common for every item of serialized data to be a separate column. Some of those columns would naturally be key fields. Additional columns might plausibly added besides the data to indicate the date+time of when the insertion occurred, the responsible user, etc., etc.
通常,序列化数据有几个字段,应该作为单独的列存储在数据库中。序列化数据的每一项都是单独的列,这是很常见的。其中一些列自然是关键字段。除了数据外,还可以添加其他列来指示插入发生的日期+时间、负责任的用户等。
#8
-3
I found:
我发现:
varchar(5000)
to be the best balance of size/speed for us. Also, it works with the rails 3 serialize data (varbinary) was throwing serialize errors intermittently.
使我们在尺寸/速度上达到最佳平衡。此外,它还与rails 3串行化数据(varbinary)一起间歇性地抛出串行化错误。
#1
58
To answer: text is deprecated in a lot of DBMS it seems, so better use either a blob or a varchar with a high limit (and with blob you won't get any encoding issues, which is a major hassle with varchar and text).
答案是:在很多DBMS中,文本被弃用了,所以最好使用一个blob或varchar(对于blob,您不会遇到任何编码问题,这对于varchar和text来说是一个很大的麻烦)。
Also as pointed in this thread at the MySQL forums, hard-drives are cheaper than software, so you'd better first design your software and make it work, and only then if space becomes an issue, you may want to optimize that aspect. So don't try to overoptimize the size of your column too early on, better set the size larger at first (plus this will avoid security issues).
同样,正如在MySQL论坛的这篇文章中指出的,硬盘比软件便宜,所以你最好先设计你的软件并让它工作,只有当空间成为问题时,你才可能想要优化这个方面。因此,不要过早地对列的大小进行过度优化,最好首先将其设置为更大的大小(另外,这将避免安全问题)。
About the various comments: Too much SQL fanaticism here. Despite the fact that I am greatly fond of SQL and relational models, they also have their pitfalls.
关于各种评论:这里有太多的SQL*。尽管我非常喜欢SQL和关系模型,但它们也有缺陷。
Storing serialized data into the database as-is (such as storing JSON or XML formatted data) has a few advantages:
将序列化数据存储到数据库中(例如存储JSON或XML格式的数据)有一些优势:
- You can have a more flexible format for your data: adding and removing fields on the fly, changing the specification of the fields on the fly, etc...
- 您可以为您的数据提供更灵活的格式:动态地添加和删除字段,动态地更改字段的规范,等等……
- Less impedance mismatch with the object model: you store and you fetch the data just as it is in your program, compared to fetching the data and then having to process and convert it between your program objects' structures and your relational database's structures.
- 减少与对象模型的阻抗不匹配:与获取数据相比,您可以存储和获取程序中的数据,然后必须在程序对象的结构和关系数据库的结构之间进行处理和转换。
And there are a lot more other advantages, so please no fanboyism: relational databases are a great tool, but let's not dish the other tools we can get. More tools, the better.
而且还有很多其他的优势,所以请不要盲目跟风:关系数据库是一个很好的工具,但是我们不要再使用我们可以获得的其他工具了。工具,越多越好。
As for a concrete example of use, I tend to add a JSON field in my database to store extra parameters of a record where the columns (properties) of the JSON data will never be SELECT'd individually, but only used when the right record is already selected. In this case, I can still discriminate my records with the relational columns, and when the right record is selected, I can just use the extra parameters for whatever purpose I want.
至于具体的使用示例,我倾向于在数据库中添加一个JSON字段来存储一个记录的额外参数,其中永远不会单独选择JSON数据的列(属性),但只在已经选择正确的记录时使用。在这种情况下,我仍然可以用关系列来区分我的记录,当选择正确的记录时,我可以使用额外的参数来实现我想要的任何目的。
So my advice to retain the best of both world (speed, serializability and structural flexibility), just use a few standard relational columns to serve as unique keys to discriminate between your rows, and then use a blob/varchar column where your serialized data will be inserted. Usually, only two/three columns are required for a unique key, thus this won't be a major overhead.
因此,我的建议是,保持两方面的优势(速度、可串行性和结构灵活性),只需使用一些标准的关系列作为惟一的键来区分行,然后使用blob/varchar列插入序列化数据。通常,一个唯一的键只需要2 / 3列,因此这不会是很大的开销。
Also, you may be interested by PostgreSQL which now has a JSON datatype, and the PostSQL project to directly process JSON fields just as relational columns.
另外,您可能对PostgreSQL感兴趣,它现在有一个JSON数据类型,而PostSQL项目直接处理JSON字段,就像关系列一样。
#2
10
How much do you plan to store? Check out the specs for the string types at the MySQL docs and their sizes. The key here is that you don't care about indexing this column, but you also never want it to overflow and get truncated, since then you JSON is unreadable.
你打算存多少钱?查看MySQL文档中字符串类型的规格及其大小。这里的关键是您不关心对这个列进行索引,但是您也不希望它溢出和被截断,因为您JSON是不可读的。
- TINYTEXT L < 2^8
- 非常小的文本串L < 2 ^ 8
- TEXT L < 2^16
- 文本L < 2 ^ 16
- MEDIUMTEXT L < 2^24
- 简单L < 2 ^ 24
- LONGTEXT L < 2^32
- 量变L < 2 ^ 32
Where L is the length in character
L是字符的长度
Just plain text should be enough, but go bigger if you are storing more. Though, in that case, you might not want to be storing it in the db.
纯文本就足够了,但是如果你要存储更多的话,就需要更大的文本。但是,在这种情况下,您可能不希望将它存储在db中。
#3
10
The length limits that @Twisted Pear mentions are good reasons.
@Twisted Pear提到的长度限制是很好的理由。
Also consider that TEXT
and its ilk have a charset associated with them, whereas BLOB
data types do not. If you're just storing raw bytes of data, you might as well use BLOB
instead of TEXT
.
还要考虑文本及其同类具有与它们相关联的字符集,而BLOB数据类型则没有。如果只是存储原始数据字节,那么最好使用BLOB而不是文本。
Note that you can still store textual data in a BLOB
, you just can't do any SQL operations on it that take charset into account; it's just bytes to SQL. But that's probably not an issue in your case, since it's serialized data with structure unknown to SQL anyway. All you need to do is store bytes and fetch bytes. The interpretation of the bytes is up to your app.
注意,您仍然可以在BLOB中存储文本数据,只是不能对它执行任何考虑到charset的SQL操作;它只是SQL的字节。但在您的例子中,这可能不是一个问题,因为它是序列化的数据,其结构对于SQL来说是未知的。您需要做的就是存储字节和获取字节。字节的解释取决于你的应用程序。
I have also had troubles using LONGBLOB
or LONGTEXT
using certain client libraries (e.g. PHP) because the client tries to allocate a buffer as large as the largest possible data type, not knowing how large the content will be on any given row until it's fetched. This caused PHP to burst into flames as it tried to allocate a 4GB buffer. I don't know what client you're using, or whether it suffers from the same behavior.
使用某些客户端库(例如PHP)使用LONGBLOB或LONGTEXT也有问题,因为客户端试图分配一个尽可能大的数据类型的缓冲区,在获取内容之前,不知道任何一行上的内容有多大。这导致PHP在试图分配4GB缓冲区时突然起火。我不知道你在用哪个客户,或者它是否有同样的行为。
The workaround: use MEDIUMBLOB
or just BLOB
, as long as those types are sufficient to store your serialized data.
解决方案:使用MEDIUMBLOB或BLOB,只要这些类型足以存储序列化数据。
On the issue of people telling you not to do this, I'm not going to tell you that (in spite of the fact that I'm an SQL advocate). It's true you can't use SQL expressions to perform operations on individual elements within the serialized data, but that's not your purpose. What you do gain by putting that data into the database includes:
关于人们告诉你不要这样做的问题,我不会告诉你(尽管我是SQL倡导者)。确实,您不能使用SQL表达式对序列化数据中的单个元素执行操作,但这不是您的目的。将数据放入数据库所获得的好处包括:
- Associate serialized data with other more relational data.
- 将序列化数据与其他更多关系数据关联起来。
- Ability to store and fetch serialized data according to transaction scope, COMMIT, ROLLBACK.
- 能够根据事务范围、提交、回滚来存储和获取序列化数据。
- Store all your relational and non-relational data in one place, to make it easier to replicate to slaves, back up and restore, etc.
- 将所有关系和非关系数据存储在一个地方,以便更容易地复制到奴隶、备份和恢复等等。
#4
7
LONGTEXT
量变
Wordpress stores serialized data in their postmeta table as LONGTEXT. I find the Wordpress database to be a good place to research datatypes for columns.
Wordpress将序列化数据存储在他们的postmeta表中作为LONGTEXT。我发现Wordpress数据库是研究列数据类型的好地方。
#5
2
As of MySQL 5.7.8, MySQL supports a native JSON data type: MySQL Manual
从MySQL 5.7.8开始,MySQL支持本地JSON数据类型:MySQL手册
#6
1
I might be late to the party, but the php.net documentation about serialized object states the following:
我可能会迟到,但是关于序列化对象的php.net文档说明如下:
Note that this is a binary string which may include null bytes, and needs to be stored and handled as such. For example, serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field.
注意,这是一个二进制字符串,可能包含空字节,需要存储和处理。例如,serialize()输出通常应该存储在数据库中的BLOB字段中,而不是CHAR或TEXT字段。
Source: http://php.net/manual/en/function.serialize.php
来源:http://php.net/manual/en/function.serialize.php
Hope that helps!
希望会有帮助!
#7
-1
Unless the serialized data has no other use than to be saved and restored from the database, you probably don't want to do it that way.
除非序列化数据除了要从数据库中保存和恢复之外没有其他用途,否则您可能不希望这样做。
Typically, serialized data has several fields which should be stored in the database as separate columns. It is common for every item of serialized data to be a separate column. Some of those columns would naturally be key fields. Additional columns might plausibly added besides the data to indicate the date+time of when the insertion occurred, the responsible user, etc., etc.
通常,序列化数据有几个字段,应该作为单独的列存储在数据库中。序列化数据的每一项都是单独的列,这是很常见的。其中一些列自然是关键字段。除了数据外,还可以添加其他列来指示插入发生的日期+时间、负责任的用户等。
#8
-3
I found:
我发现:
varchar(5000)
to be the best balance of size/speed for us. Also, it works with the rails 3 serialize data (varbinary) was throwing serialize errors intermittently.
使我们在尺寸/速度上达到最佳平衡。此外,它还与rails 3串行化数据(varbinary)一起间歇性地抛出串行化错误。