I need to design a Key/value table in my database and I'm looking for guidance on the best way to do this. Basically, I need to be able to associate values to a dynamic set of named properties and apply them to an external key.
我需要在我的数据库中设计一个键/值表,我正在寻找最佳方法的指导。基本上,我需要能够将值关联到命名属性的动态集合,并将它们应用到外部键。
The operations I need to be able to support are:
我需要支助的业务是:
- Apply a key/value pair to a group of items
- 对一组项应用键/值对
- Enumerate all of the currently-active keys
- 枚举当前活动的所有键
- Determine all of the items that have a value for a given key
- 确定具有给定键值的所有项
- Determine all of the items where the value associated with a given key matches some criteria.
- 确定与给定键相关联的值与某些标准相匹配的所有项。
It seems that the simplest way to do this is to define a table:
似乎最简单的方法就是定义一个表:
CREATE TABLE KeyValue (
id int,
Key varchar...,
Value varchar...
);
It seems that I am likely to be duplicating a lot of data in the Key column because I any given key is likely to be defined for a large number of documents. Replacing the Key varchar with an integer lookup into another table seems to alleviate this problem (and make it significantly more efficient to enumerate all of the active keys), but sticks me with the problem of maintaining that lookup table (upserting into it whenever I want to define a property and potentially removing the entry any time a key/value is cleared).
看起来我很可能在Key列中复制了大量数据,因为任何给定的键都可能为大量文档定义。替换关键varchar整数查找到另一个表似乎减轻这个问题(和使它更有效枚举所有的活动密钥),但棒我维护的问题,查找表(更新插入到每当我想可能定义一个属性和删除条目随时清除键/值)。
What's the best way to do this?
最好的方法是什么?
6 个解决方案
#1
30
You are employing a database model called Entity-Attribute-Value. This is a common way to store key/value pairs in a relational database, but it has a number of weaknesses with respect to database normalization and efficiency.
您正在使用一个名为实体-属性-值的数据库模型。这是在关系数据库中存储键/值对的一种常见方法,但是它在数据库规范化和效率方面有很多缺点。
Yes, the table design you showed is the most common way to do it. In this design, every attribute of every entity gets a distinct row in your KeyValue
table.
是的,您展示的表格设计是最常见的方法。在这个设计中,每个实体的每个属性在KeyValue表中都有一个不同的行。
Apply a key/value pair to a group of items: You need to add one row for each item in the group.
对一组项目应用键/值对:需要为组中的每个项目添加一行。
INSERT INTO KeyValue (id, key, value) VALUES (101, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (102, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (103, 'color', 'green');
You may also prepare the INSERT statement with parameters and run through a number of item id's in a loop, or whatever.
您还可以使用参数准备INSERT语句,并在循环中运行许多项id,或者其他什么。
Enumerate all of the currently-active keys:
列举所有当前活动的键:
SELECT DISTINCT Key FROM KeyValue;
Determine all of the items that have a value for a given key:
确定给定键值的所有项:
SELECT id FROM KeyValue WHERE Key = 'color';
Determine all of the items where the value associated with a given key matches some criteria:
确定与给定键相关的值与某些标准匹配的所有项:
SELECT id FROM KeyValue WHERE Value = 'green';
Some of the problems with Entity-Attribute-Value are:
实体-属性-值的一些问题是:
- No way to make sure keys are spelled the same for all items
- 没有办法确保所有项的键都拼写相同
- No way to make some keys mandatory for all items (i.e. NOT NULL in a conventional table design).
- 没有办法强制所有项使用某些键(例如,在常规的表设计中不为空)。
- All keys must use VARCHAR for the value; can't store different data types per key.
- 所有键都必须使用VARCHAR作为值;不能为每个键存储不同的数据类型。
- No way to use referential integrity; can't make a FOREIGN KEY that applies to values of some keys and not others.
- 无法使用引用完整性;不能使外键适用于某些键的值,而不适用于其他键的值。
Basically, Entity-Attribute-Value is not a normalized database design.
基本上,实体-属性-值不是一种规范化的数据库设计。
#2
5
Don't optimize this unless you have to. What is the average length of a key? Will this table be so big it won't all fit into your server's memory if you implement it the naive way? I'd suggest implementing it the simplest way, measure the performance, and then re-implement only if performance is a problem.
除非必要,否则不要优化它。键的平均长度是多少?如果您以简单的方式实现这个表,它会不会太大,以至于不能完全放入服务器的内存中?我建议以最简单的方式实现它,度量性能,然后只有当性能是一个问题时才重新实现。
If performance is a problem, then using an integer key and a separate table is probably the way to go (JOINS on integer columns are typically faster than JOINS using variable-length-string columns). But the first rule of optimizing is MEASURE FIRST-- make sure your supposedly-optimized code actually does make thing run faster.
如果性能是一个问题,那么使用一个整数键和一个单独的表可能是解决问题的方法(整数列上的连接通常比使用可变长字符串列的连接快)。但是,优化的第一个规则是首先度量——确保您假定的优化代码确实能使事情运行得更快。
#3
1
An option that may be worth exploring is digesting the key using SHA1 or MD5 before inserting it into the table.
一个值得探讨的选项是在将密钥插入到表之前,使用SHA1或MD5对密钥进行分解。
That will allow you to get rid of the lookup table, but you will not be able to iterate through the keys cause it only goes one way.
这将允许您删除查找表,但是您将无法遍历键,因为它只向一个方向移动。
#5
1
It seems to me like you might have a couple design choices.
在我看来,你可能有几个设计选择。
Choice 1: A two table design you hinted at in your answer
选项1:您在回答中暗示了一个双表设计
Keys (
id int not null auto_increment
key string/int
)
values (
id int not null auto_increment
key_id int
value string/varchar/int
)
Choice 2: perhaps as sambo99 pointed out you could modify this:
选择2:也许正如sambo99指出的,你可以修改这个:
keys (
id int not null auto_increment
key string/int
hash_code int -- this would be computed by the inserting code, so that lookups would effectively have the id, and you can look them up directly
)
values (
id int not null auto_increment -- this column might be nice since your hash_codes might colide, and this will make deletes/updates easier
key_id int -- this column becomes optional
hash_code int
value string/varchar/int...
)
--
- - -
#6
0
Key value pair is generally not a good use of relational databases. the benefits of relational databases are the constraints, validation and structure that goes with it. By using a generic key-value structure in your table you are losing the validation and constraints that make relational databases good. If you want the flexible design of key value pairs, you would be best served by a NoSQL database like MongoDB or its ilk.
键值对通常不是关系数据库的良好使用。关系数据库的好处是它的约束、验证和结构。通过在表中使用通用的键值结构,您将失去使关系数据库良好的验证和约束。如果您希望灵活地设计键值对,您最好使用NoSQL数据库,如MongoDB或类似的数据库。
Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the ever-changing structure.
当基础数据是非结构化、不可预测或经常变化时,键值对(例如NoSQL数据库)最有效。如果您没有结构化数据,那么关系数据库将会比它的价值更麻烦,因为您将需要进行大量的模式更改和/或跳转,以使您的数据符合不断变化的结构。
KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.
KVP / JSON / NoSql非常棒,因为对数据结构的更改不需要完全重构数据模型。向数据对象添加字段只是向数据添加字段的问题。另一方面,与关系数据库相比,KVP / Nosql数据库中的约束和验证检查更少,因此数据可能会变得混乱。
There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out. This will make your application easier to maintain and support in the long term. Another approach is to use a data abstraction layer in your code, like Django or SQL Alchemy for Python, Entity Framework for .NET. That way as your code changes your database will change with it automatically.
关系数据模型有性能和节省空间的优点。规范化关系数据可以使理解和验证数据更容易,因为有表键关系和约束可以帮助您解决问题。这将使您的应用程序更容易维护和长期支持。另一种方法是在代码中使用数据抽象层,如Python的Django或SQL炼金术,. net的实体框架。这样,当代码更改时,数据库将自动更改。
One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.
我所见过的最糟糕的一种模式是尝试两种方式。尝试将键-值对放入关系数据库通常会导致灾难。我建议使用最适合您的数据的技术。
#1
30
You are employing a database model called Entity-Attribute-Value. This is a common way to store key/value pairs in a relational database, but it has a number of weaknesses with respect to database normalization and efficiency.
您正在使用一个名为实体-属性-值的数据库模型。这是在关系数据库中存储键/值对的一种常见方法,但是它在数据库规范化和效率方面有很多缺点。
Yes, the table design you showed is the most common way to do it. In this design, every attribute of every entity gets a distinct row in your KeyValue
table.
是的,您展示的表格设计是最常见的方法。在这个设计中,每个实体的每个属性在KeyValue表中都有一个不同的行。
Apply a key/value pair to a group of items: You need to add one row for each item in the group.
对一组项目应用键/值对:需要为组中的每个项目添加一行。
INSERT INTO KeyValue (id, key, value) VALUES (101, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (102, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (103, 'color', 'green');
You may also prepare the INSERT statement with parameters and run through a number of item id's in a loop, or whatever.
您还可以使用参数准备INSERT语句,并在循环中运行许多项id,或者其他什么。
Enumerate all of the currently-active keys:
列举所有当前活动的键:
SELECT DISTINCT Key FROM KeyValue;
Determine all of the items that have a value for a given key:
确定给定键值的所有项:
SELECT id FROM KeyValue WHERE Key = 'color';
Determine all of the items where the value associated with a given key matches some criteria:
确定与给定键相关的值与某些标准匹配的所有项:
SELECT id FROM KeyValue WHERE Value = 'green';
Some of the problems with Entity-Attribute-Value are:
实体-属性-值的一些问题是:
- No way to make sure keys are spelled the same for all items
- 没有办法确保所有项的键都拼写相同
- No way to make some keys mandatory for all items (i.e. NOT NULL in a conventional table design).
- 没有办法强制所有项使用某些键(例如,在常规的表设计中不为空)。
- All keys must use VARCHAR for the value; can't store different data types per key.
- 所有键都必须使用VARCHAR作为值;不能为每个键存储不同的数据类型。
- No way to use referential integrity; can't make a FOREIGN KEY that applies to values of some keys and not others.
- 无法使用引用完整性;不能使外键适用于某些键的值,而不适用于其他键的值。
Basically, Entity-Attribute-Value is not a normalized database design.
基本上,实体-属性-值不是一种规范化的数据库设计。
#2
5
Don't optimize this unless you have to. What is the average length of a key? Will this table be so big it won't all fit into your server's memory if you implement it the naive way? I'd suggest implementing it the simplest way, measure the performance, and then re-implement only if performance is a problem.
除非必要,否则不要优化它。键的平均长度是多少?如果您以简单的方式实现这个表,它会不会太大,以至于不能完全放入服务器的内存中?我建议以最简单的方式实现它,度量性能,然后只有当性能是一个问题时才重新实现。
If performance is a problem, then using an integer key and a separate table is probably the way to go (JOINS on integer columns are typically faster than JOINS using variable-length-string columns). But the first rule of optimizing is MEASURE FIRST-- make sure your supposedly-optimized code actually does make thing run faster.
如果性能是一个问题,那么使用一个整数键和一个单独的表可能是解决问题的方法(整数列上的连接通常比使用可变长字符串列的连接快)。但是,优化的第一个规则是首先度量——确保您假定的优化代码确实能使事情运行得更快。
#3
1
An option that may be worth exploring is digesting the key using SHA1 or MD5 before inserting it into the table.
一个值得探讨的选项是在将密钥插入到表之前,使用SHA1或MD5对密钥进行分解。
That will allow you to get rid of the lookup table, but you will not be able to iterate through the keys cause it only goes one way.
这将允许您删除查找表,但是您将无法遍历键,因为它只向一个方向移动。
#4
#5
1
It seems to me like you might have a couple design choices.
在我看来,你可能有几个设计选择。
Choice 1: A two table design you hinted at in your answer
选项1:您在回答中暗示了一个双表设计
Keys (
id int not null auto_increment
key string/int
)
values (
id int not null auto_increment
key_id int
value string/varchar/int
)
Choice 2: perhaps as sambo99 pointed out you could modify this:
选择2:也许正如sambo99指出的,你可以修改这个:
keys (
id int not null auto_increment
key string/int
hash_code int -- this would be computed by the inserting code, so that lookups would effectively have the id, and you can look them up directly
)
values (
id int not null auto_increment -- this column might be nice since your hash_codes might colide, and this will make deletes/updates easier
key_id int -- this column becomes optional
hash_code int
value string/varchar/int...
)
--
- - -
#6
0
Key value pair is generally not a good use of relational databases. the benefits of relational databases are the constraints, validation and structure that goes with it. By using a generic key-value structure in your table you are losing the validation and constraints that make relational databases good. If you want the flexible design of key value pairs, you would be best served by a NoSQL database like MongoDB or its ilk.
键值对通常不是关系数据库的良好使用。关系数据库的好处是它的约束、验证和结构。通过在表中使用通用的键值结构,您将失去使关系数据库良好的验证和约束。如果您希望灵活地设计键值对,您最好使用NoSQL数据库,如MongoDB或类似的数据库。
Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the ever-changing structure.
当基础数据是非结构化、不可预测或经常变化时,键值对(例如NoSQL数据库)最有效。如果您没有结构化数据,那么关系数据库将会比它的价值更麻烦,因为您将需要进行大量的模式更改和/或跳转,以使您的数据符合不断变化的结构。
KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.
KVP / JSON / NoSql非常棒,因为对数据结构的更改不需要完全重构数据模型。向数据对象添加字段只是向数据添加字段的问题。另一方面,与关系数据库相比,KVP / Nosql数据库中的约束和验证检查更少,因此数据可能会变得混乱。
There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out. This will make your application easier to maintain and support in the long term. Another approach is to use a data abstraction layer in your code, like Django or SQL Alchemy for Python, Entity Framework for .NET. That way as your code changes your database will change with it automatically.
关系数据模型有性能和节省空间的优点。规范化关系数据可以使理解和验证数据更容易,因为有表键关系和约束可以帮助您解决问题。这将使您的应用程序更容易维护和长期支持。另一种方法是在代码中使用数据抽象层,如Python的Django或SQL炼金术,. net的实体框架。这样,当代码更改时,数据库将自动更改。
One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.
我所见过的最糟糕的一种模式是尝试两种方式。尝试将键-值对放入关系数据库通常会导致灾难。我建议使用最适合您的数据的技术。