数据库表中的动态列与EAV。

I'm trying to decide which way to go if I have an app that needs to be able to change the db schema based on the user input.

如果我有一个需要根据用户输入修改db模式的应用程序，我正在尝试决定该走哪条路。

For example, if I have a "car" object that contains car properties, like year, model, # of doors etc, how do I store it in the DB in such a way, that the user should be able to add new properties?

例如，如果我有一个包含汽车属性的“car”对象，比如year、model、# of doors等等，我如何将它存储在DB中，以便用户能够添加新的属性?

I read about EAV tables and they seem right for this thing, but the problem is that queries will get pretty complicated when I try to get a list of cars filtered by a set of properties.

我读过EAV表，它们看起来很适合这个东西，但问题是当我试图让一组属性过滤汽车列表时，查询会变得非常复杂。

Could I generate the tables dynamically instead? I see that Sqlite has support for ADD COLUMN, but how fast is it when the table reaches many records? And it looks like there's no way to remove a column. I have to create a new table without the column I want to remove, and copy the data from the old table. That's certainly slow on large tables :(

我可以动态生成表吗?我看到Sqlite支持添加列，但是当表达到多个记录时，它的速度有多快?看起来好像没有办法删除一列。我必须创建一个没有要删除的列的新表，并从旧表中复制数据。这在大桌子上确实很慢。

5 个解决方案

#1

I will assume that SQLite (or another relational DBMS) is a requirement.

我将假设SQLite(或另一个关系DBMS)是一个需求。

EAVs

由

I have worked with EAVs and generic data models, and I can say that the data model is very messy and hard to work with in the long run.

我已经与EAVs和通用数据模型合作过，我可以说，从长远来看，数据模型是非常混乱和难以处理的。

Lets say that you design a datamodel with three tables: entities, attributes, and _entities_attributes_:

假设您设计了一个数据模型，包含三个表:实体、属性和_entities_attributes_:

CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY, name TEXT);

CREATE TABLE attributes 
(attribute_id INTEGER PRIMARY KEY, name TEXT, type TEXT);

CREATE TABLE entity_attributes 
(entity_id INTEGER, attribute_id INTEGER, value TEXT, 
PRIMARY KEY(entity_id, attribute_id));

In this model, the entities table will hold your cars, the attributes table will hold the attributes that you can associate to your cars (brand, model, color, ...) and its type (text, number, date, ...), and the _entity_attributes_ will hold the values of the attributes for a given entity (for example "red").

在这个模型中,实体表将你的汽车,将持有的属性表的属性,你可以把你的车(品牌、模型、颜色、…)和它的类型(文本、数字、日期、…),和_entity_attributes_将属性的值对于一个给定的实体(例如“红色”)。

Take into account that with this model you can store as many entities as you want and they can be cars, houses, computers, dogs or whatever (ok, maybe you need a new field on entities, but it's enough for the example).

考虑到这个模型，您可以存储任意数量的实体，它们可以是汽车、房屋、计算机、狗或其他(好吧，也许您需要一个实体的新字段，但是对于这个示例来说已经足够了)。

INSERTs are pretty straightforward. You only need to insert a new object, a bunch of attributes and its relations. For example, to insert a new entity with 3 attributes you will need to execute 7 inserts (one for the entity, three more for the attributes, and three more for the relations.

插入很简单。您只需插入一个新对象、一组属性及其关系。例如，要插入一个具有3个属性的新实体，需要执行7个插入(一个用于实体，三个用于属性，三个用于关系。

When you want to perform an UPDATE, you will need to know what is the entity that you want to update, and update the desired attribute joining with the relation between the entity and its attributes.

当您想执行更新时，您将需要知道要更新的实体是什么，并更新与实体及其属性之间的关系相关联的所需属性。

When you want to perform a DELETE, you will also need to need to know what is the entity you want to delete, delete its attributes, delete the relation between your entity and its attributes and then delete the entity.

当您想要执行删除操作时，您还需要知道要删除的实体是什么，删除它的属性，删除实体与其属性之间的关系，然后删除实体。

But when you want to perform a SELECT the thing becomes nasty (you need to write really difficult queries) and the performance drops horribly.

但是，当您想执行SELECT时，事情就变得非常糟糕(您需要编写非常困难的查询)，并且性能会严重下降。

Imagine a data model to store car entities and its properties as in your example (say that we want to store brand and model). A SELECT to query all your records will be

想象一个数据模型来存储汽车实体及其属性，如您的示例(假设我们想存储品牌和模型)。查询所有记录的SELECT将是

SELECT brand, model FROM cars;

If you design a generic data model as in the example, the SELECT to query all your stored cars will be really difficult to write and will involve a 3 table join. The query will perform really bad.

如果您设计一个通用的数据模型，如示例中所示，那么选择查询所有存储的汽车将非常困难，并且将涉及一个3表连接。查询将执行得非常糟糕。

Also, think about the definition of your attributes. All your attributes are stored as TEXT, and this can be a problem. What if somebody makes a mistake and stores "red" as a price?

还要考虑属性的定义。所有属性都存储为文本，这可能是个问题。如果有人犯了一个错误，把“红色”作为价格储存起来会怎样?

Indexes are another thing that you could not benefit of (or at least not as much as it would be desirable), and they are very neccesary as the data stored grows.

索引是另一种您不能受益的东西(或者至少不如它所期望的那样多)，而且随着存储的数据的增长，索引是非常必要的。

As you say, the main concern as a developer is that the queries are really hard to write, hard to test and hard to maintain (how much would a client have to pay to buy all red, 1980, Pontiac Firebirds that you have?), and will perform very poorly when the data volume increases.

就像你说的,作为一名开发人员主要关心的问题是,查询是真的很难写,难以测试和难以维持(客户需要支付多少钱购买红色,1980年庞蒂克火鸟你?),数据量增加时,将执行非常差。

The only advantage of using EAVs is that you can store virtually everything with the same model, but is like having a box full of stuff where you want to find one concrete, small item.

使用EAVs的唯一好处是，你可以用相同的模型存储几乎所有的东西，但是就像拥有一个装满东西的盒子，你想找到一个具体的，小的项目。

Also, to use an argument from authority, I will say that Tom Kyte argues strongly against generic data models: http://tkyte.blogspot.com.es/2009/01/this-should-be-fun-to-watch.html https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

另外，为了使用权威的论点，我要说Tom Kyte强烈反对通用的数据模型:http://tkyte.blogspot.com.es/2009/01/this-should-be funto -watch.html https://asktom.oracle.com/pls/asktom/fp/asktom/fp =100::: p11_11780?

Dynamic columns in database tables

数据库表中的动态列

On the other hand, you can, as you say, generate the tables dynamically, adding (and removing) columns when needed. In this case, you can, for example create a car table with the basic attributes that you know that you will use and then add columns dynamically when you need them (for example the number of exhausts).

另一方面，您可以像您说的那样，在需要的时候动态地生成表、添加(和删除)列。在这种情况下，您可以，例如，创建一个带有基本属性的car表，您知道您将使用这些属性，然后在需要的时候动态添加列(例如排气的数量)。

The disadvantage is that you will need to add columns to an existing table and (maybe) build new indexes.

缺点是需要向现有表添加列，(可能)构建新的索引。

This model, as you say, also has another problem when working with SQLite as there's no direct way to delete columns and you will need to do this as stated on http://www.sqlite.org/faq.html#q11

正如您所说，这个模型在使用SQLite时也有另一个问题，因为没有直接的方法来删除列，您需要按照http://www.sqlite.org/faq.html#q11中所述的那样做

BEGIN TRANSACTION;
CREATE TEMPORARY TABLE t1_backup(a,b);
INSERT INTO t1_backup SELECT a,b FROM t1;
DROP TABLE t1;
CREATE TABLE t1(a,b);
INSERT INTO t1 SELECT a,b FROM t1_backup;
DROP TABLE t1_backup;
COMMIT;

Anyway, I don't really think that you will need to delete columns (or at least it will be a very rare scenario). Maybe someone adds the number of doors as a column, and stores a car with this property. You will need to ensure that any of your cars have this property to prevent from losing data before deleting the column. But this, of course depends on your concrete scenario.

无论如何，我并不认为您需要删除列(或者至少这是一个非常罕见的场景)。也许有人会把门的数量加为一列，然后用这个属性来存放一辆车。您需要确保您的任何汽车都有此属性，以防止在删除列之前丢失数据。但这当然取决于你的具体情况。

Another drawback of this solution is that you will need a table for each entity you want to store (one table to store cars, another to store houses, and so on...).

这种解决方案的另一个缺点是，您将需要为您想要存储的每个实体(一个表用于存储汽车，另一个表用于存储房屋，等等)提供一个表。

Another option (pseudo-generic model)

另一个选择(pseudo-generic模型)

A third option could be to have a pseudo-generic model, with a table having columns to store id, name, and type of the entity, and a given (enough) number of generic columns to store the attributes of your entities.

第三种选择是拥有一个伪泛型模型，其中的表包含用于存储实体id、名称和类型的列，以及用于存储实体属性的给定(足够)数量的泛型列。

Lets say that you create a table like this:

假设您创建了一个这样的表:

CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY,
 name TEXT,
 type TEXT,
 attribute1 TEXT,
 attribute1 TEXT,
 ...
 attributeN TEXT
 );

In this table you can store any entity (cars, houses, dogs) because you have a type field and you can store as many attributes for each entity as you want (N in this case).

在该表中，您可以存储任何实体(汽车、房屋、狗)，因为您有一个类型字段，您可以为每个实体存储任意多的属性(在本例中为N)。

If you need to know what the attribute37 stands for when type is "red", you would need to add another table that relates the types and attributes with the description of the attributes.

如果您需要知道当类型为“red”时attribute37表示什么，则需要添加另一个表，该表将类型和属性与属性的描述相关联。

And what if you find that one of your entities needs more attributes? Then simply add new columns to the entities table (attributeN+1, ...).

如果你发现你的一个实体需要更多的属性呢?然后只需向entity表添加新列(attributeN+1，…)。

In this case, the attributes are always stored as TEXT (as in EAVs) with it's disadvantages.

在这种情况下，属性总是以文本形式存储(如在EAVs中)，这是有缺点的。

But you can use indexes, the queries are really simple, the model is generic enough for your case, and in general, I think that the benefits of this model are greater than the drawbacks.

但是您可以使用索引，查询非常简单，这个模型对于您的案例来说是足够通用的，总的来说，我认为这个模型的优点大于缺点。

Hope it helps.

希望它可以帮助。

Follow up from the comments:

跟进评论:

With the pseudo-generic model your entities table will have a lot of columns. From the documentation (https://www.sqlite.org/limits.html), the default setting for SQLITE_MAX_COLUMN is 2000. I have worked with SQLite tables with over 100 columns with great performance, so 40 columns shouldn't be a big deal for SQLite.

使用伪泛型模型，实体表将包含大量列。从文档(https://www.sqlite.org/limits.html)中，SQLITE_MAX_COLUMN的默认设置为2000。我使用过SQLite表，它有100多列，性能很好，所以40列对于SQLite来说不是什么大问题。

As you say, most of your columns will be empty for most of your records, and you will need to index all of your colums for performance, but you can use partial indexes (https://www.sqlite.org/partialindex.html). This way, your indexes will be small, even with a high number of rows, and the selectivity of each index will be great.

如您所言，对于大多数记录，您的大多数列都是空的，为了性能，您需要索引所有的列，但是您可以使用部分索引(https://www.sqlite.org/partialindex.html)。这样，即使有大量的行，索引也会很小，而且每个索引的选择性也会很大。

If you implement a EAV with only two tables, the number of joins between tables will be less than in my example, but the queries will still be hard to write and maintain, and you will need to do several (outer) joins to extract data, which will reduce performance, even with a great index, when you store a lot of data. For example, imagine that you want to get the brand, model and color of your cars. Your SELECT would look like this:

如果你实现一个由只有两个表,表之间的连接的数量将会比在我的例子中,但查询仍将难以编写和维护,你需要做一些(外部)连接来提取数据,这将减少性能,即使有一个伟大的索引,当你存储大量的数据。例如，假设你想要得到你的汽车的品牌、型号和颜色。您的选择如下:

SELECT e.name, a1.value brand, a2.value model, a3.value color
FROM entities e
LEFT JOIN entity_attributes a1 ON (e.entity_id = a1.entity_id and a1.attribute_id = 'brand')
LEFT JOIN entity_attributes a2 ON (e.entity_id = a2.entity_id and a2.attribute_id = 'model')
LEFT JOIN entity_attributes a3 ON (e.entity_id = a3.entity_id and a3.attribute_id = 'color');

As you see, you would need one (left) outer join for each attribute you want to query (or filter). With the pseudo-generic model the query will be like this:

如您所见，您将需要为每个要查询(或筛选)的属性设置一个(左)外部连接。对于伪泛型模型，查询如下:

SELECT name, attribute1 brand, attribute7 model, attribute35 color
FROM entities;

Also, take into account the potential size of your _entity_attributes_ table. If you can potentially have 40 attributes for each entity, lets say that you have 20 not null for each of them. If you have 10,000 entities, your _entity_attributes_ table will have 200,000 rows, and you will be querying it using one huge index. With the pseudo-generic model you will have 10,000 rows and one small index for each column.

另外，还要考虑_entity_attributes_表的潜在大小。如果每个实体可能有40个属性，那么假设每个实体有20个not null。如果您有10,000个实体，您的_entity_attributes_表将有200,000行，您将使用一个大型索引查询它。使用伪泛型模型，您将为每列拥有10,000行和一个小索引。

#2

It all depends on the way in which your application needs to reason about the data.

这完全取决于应用程序需要对数据进行推理的方式。

If you need to run queries which need to do complicated comparisons or joins on data whose schema you don't know in advance, SQL and the relational model are rarely a good fit.

如果您需要运行需要进行复杂比较的查询，或者需要对模式您事先不知道的数据进行连接，那么SQL和关系模型很少适合。

For instance, if your users can set up arbitrary data entities (like "car" in your example), and then want to find cars whose engine capacity is greater than 2000cc, with at least 3 doors, made after 2010, whose current owner is part of the "little old ladies" table, I'm not aware of an elegant way of doing this in SQL.

例如,如果您的用户可以设置任意数据实体(如“车”在你的例子),然后想找汽车的引擎容量大于2000 cc,至少3门,2010年之后,其当前的所有者是“小老太太”表的一部分,我不知道这样做的一种优雅的方式SQL。

However, you could achieve something like this using XML, XPath etc.

但是，您可以使用XML、XPath等实现类似的功能。

If your application has a set on data entities with known attributes, but users can extend those attributes (a common requirement for products like bug trackers), "add column" is a good solution. However, you may need to invent a custom query language to allow users to query those columns. For instance, Atlassian Jira's bug tracking solution has JQL, a SQL-like language for querying bugs.

如果您的应用程序有一个关于具有已知属性的数据实体的集合，但是用户可以扩展这些属性(bug跟踪器等产品的常见需求)，“添加列”是一个很好的解决方案。但是，您可能需要发明一种自定义查询语言来允许用户查询这些列。例如，Atlassian Jira的bug跟踪解决方案有JQL，一种类似sql的语言，用于查询bug。

EAV is great if your task is to store and then show data. However, even moderately complex queries become very hard in an EAV schema - imagine how you'd execute my made up example above.

如果您的任务是存储并显示数据，EAV是很好的选择。然而，在EAV模式中，即使是比较复杂的查询也会变得非常困难——想象一下如何执行上面我创建的示例。

#3

For your use case, a document oriented database like MongoDB would do great.

对于您的用例，像MongoDB这样的面向文档的数据库将会非常出色。

#4

I would try EAV.

我将尝试由。

Adding columns based on user input doesn't sounds nice to me and you can quickly run out of capacity. Queries on very flat table can also be a problem. Do you want to create hundreds of indexes?

根据用户输入添加列对我来说不太好，而且很快就会耗尽容量。非常平的表上的查询也可能是一个问题。您想要创建数百个索引吗?

Instead of writing every thing to one table, I would store as many as possible common properties (price, name , color, ...) in the main table and those less common properties in an "extra" attributes table. You can always balance them later with a little effort.

我将尽可能多地将公共属性(price、name、color、…)存储在主表中，而不是将所有东西都写到一个表中，并将那些不太常用的属性存储在“额外”属性表中。你可以用一点努力来平衡它们。

EAV can performance well for small to middle sized data set. Since you want to use SQLlite, I guess it's not be a problem.

对于中小型数据集，EAV性能很好。既然你想使用SQLlite，我想这不是问题。

You may also want to avoid "over" normalizing your data. With the cheap storage we currently have, you can use one table to store all "Extra" attributes, instead of two:

您可能还希望避免“过度”规范化您的数据。有了我们现有的廉价存储，您可以使用一个表来存储所有“额外”属性，而不是两个:

ent_id, ent_name, ... ent_id, attr_name, attr_type, attr_value ...

ent_id,ent_name,…ent_id, attr_name, attr_type, attr_value…

People against EAV will say its performance is poor on large database. It's sure that it won't performance as well as normalized structure but you don't want to change structure on a 3TB table either.

反对EAV的人会说它在大型数据库上的性能很差。它的性能肯定不如规范化结构好，但您也不希望更改3TB表的结构。

#5

I have a low quality answer, but possible, that came from HTML tags that are like : <tag width="10px" height="10px" ... />

我有一个低质量的答案，但是可能来自HTML标签，比如:

In this dirty way you will have just one column as a varchar(max) for all properties say it Props column and you will store data in it like this:

以这种肮脏的方式，你将只有一个列作为varchar(max)，对于所有属性来说，它是支柱列，你将在其中存储数据如下:

Props
------------------------------------------------------------
Model:Model of car1|Year:2010|# of doors:4
Model:Model of car2|NewProp1:NewValue1|NewProp2:NewValue2

In this way all works will go to the programming code in business layer with using some functions like concatCustom that get an array and return a string and unconcatCustom that get a string and return an array.

通过这种方式，所有的工作都将通过使用一些函数(如concatCustom)进入业务层的编程代码，这些函数获取一个数组并返回一个字符串，并返回一个获得一个字符串并返回一个数组的unconconcatcustom。

For more validity of special characters like ':' and '|', I suggest '@:@' and '@|@' or something more rare for splitter part.

对于像' ':'和'|'这样的特殊字符，我建议'@:'和'@|@'，或者对于splitter部分来说更罕见的一些字符。

In a similar way you can use a text or binary field and store an XML data in the column.

以类似的方式，您可以使用文本或二进制字段并在列中存储XML数据。

#1