如何设计数据库模式以支持使用类别标记?

时间:2021-06-12 12:49:53

I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.

我正在尝试像Database Design for Tagging这样的东西,除了我的每个标签都被分类到类别中。

For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.

例如,假设我有一个关于车辆的数据库。假设我们实际上并不太了解车辆,因此我们无法指定所有车辆将具有的列。因此,我们将用信息“标记”车辆。

1. manufacture: Mercedes
   model: SLK32 AMG
   convertible: hardtop

2. manufacture: Ford
   model: GT90
   production phase: prototype

3. manufacture: Mazda
   model: MX-5
   convertible: softtop

Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.

现在您可以看到所有汽车都标有他们的制造和型号,但其他类别并不完全匹配。请注意,汽车只能拥有每个类别中的一个。 IE浏览器。一辆汽车只能有一个制造商。

I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.

我想设计一个数据库来支持搜索所有梅赛德斯,或者能够列出所有制造商。

My current design is something like this:

我目前的设计是这样的:

vehicles
  int vid
  String vin

vehicleTags
  int vid
  int tid

tags
  int tid
  String tag
  int cid

categories
  int cid
  String category

I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?

我有所有正确的主键和外键,除了我无法处理每辆车只能有一个制造商的情况。或者我可以吗?

Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?

我可以在vehicleTags中为复合主键添加外键约束吗? IE浏览器。我可以添加一个约束,使得复合主键(vid,tid)只能在vehicleTags中没有行时才添加到vehicleTags,这样对于同一个vid,还没有一个tid在同样的cid?

My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:

我的猜测是否定的。我认为这个问题的解决方案是向vehicleTags添加一个cid列,并创建新的复合主键(vid,cid)。它看起来像:

vehicleTags
  int vid
  int cid
  int tid

This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.

这会阻止汽车有两个制造商,但现在我已经复制了tid在cid中的信息。

What should my schema be?

我的架构应该是什么?

Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?

在我之前的问题中,Tom在我的数据库模式中发现了这个问题,你如何做多对多的表外连接?

EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.

编辑我知道在示例制造中应该真的是车辆表中的一列,但是假设你不能这样做。这个例子只是一个例子。

5 个解决方案

#1


13  

This is yet another variation on the Entity-Attribute-Value design.

这是实体 - 属性 - 值设计的另一个变体。

A more recognizable EAV table looks like the following:

更易识别的EAV表如下所示:

CREATE TABLE vehicleEAV (
  vid        INTEGER,
  attr_name  VARCHAR(20),
  attr_value VARCHAR(100),
  PRIMARY KEY (vid, attr_name),
  FOREIGN KEY (vid) REFERENCES vehicles (vid)
);

Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.

有些人强制attr_name引用预定义属性名称的查找表,以限制混乱。

What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:

您所做的只是将EAV表分布在三个表上,但不改进元数据的顺序:

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,
  tid         INTEGER,
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid),
  FOREIGN KEY (tid) REFERENCES tags(tid)
);

CREATE TABLE categories (
  cid        INTEGER PRIMARY KEY,
  category   VARCHAR(20) -- "attr_name"
);

CREATE TABLE tags (
  tid        INTEGER PRIMARY KEY,
  tag        VARCHAR(100) -- "attr_value"
);

If you're going to use the EAV design, you only need the vehicleTags and categories tables.

如果您打算使用EAV设计,您只需要vehicleTags和类别表。

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,     -- reference to "attr_name" lookup table
  tag         VARCHAR(100, -- "attr_value"
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid)
);

But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.

但请记住,您正在将数据与元数据混合在一起。您将无法将某些约束应用于数据模型。

  • How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
  • 如何使其中一个类别成为必需类(传统列使用NOT NULL约束)?
  • How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
  • 如何使用SQL数据类型验证某些标记值?你不能,因为你为每个标签值使用一个长字符串。对于您将来需要的每个标签,这个字符串是否足够长?你不能说。
  • How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
  • 如何将一些标记限制为一组允许值(传统表使用外键到查找表)?这是你的“软顶”与“软顶”的例子。但是您不能对标记列进行约束,因为该约束将适用于其他类别的所有其他标记值。您可以有效地将发动机尺寸和油漆颜色限制为“软顶”。

SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.

SQL数据库不适用于此模型。要做到正确是非常困难的,并且查询它变得非常复杂。如果继续使用SQL,最好按常规方式对表进行建模,每个属性只有一列。如果您需要“子类型”,则为每个子类型定义一个从属表(Class-Table Inheritance),否则使用Single-Table Inheritance。如果每个实体的属性具有无限的变化,则使用序列化LOB。

Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is Sesame.

为这些流体非关系数据模型设计的另一种技术是语义数据库,它将数据存储在RDF中并使用SPARQL进行查询。一个免费的解决方案是芝麻。

#2


3  

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

我需要解决这个确切的问题(相同的一般领域和一切 - 汽车零件)。我发现问题的最佳解决方案是使用Lucene / Xapian / Ferret / Sphinx或您喜欢的全文索引器。比SQL提供的性能要好得多。

#3


0  

I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.

我认为您的解决方案是简单地将制造商列添加到您的车辆表中。这是一个你知道所有车辆都会拥有的属性(即汽车不会自动出现),并且通过在车辆表中将其作为一列,您可以解决每辆车只有一个制造商的问题。此方法适用于您知道将由所有车辆共享的任何属性。然后,您可以为非通用的其他属性实施标记系统。

So taking from your example the vehicle table would be something like:

所以从你的例子中得出车辆表是这样的:

vehicle
  vid
  vin
  make
  model

#4


0  

One way would be to slightly rethink your schema, normalising tag keys away from values:

一种方法是稍微重新考虑您的架构,将标记键从值中标准化:

vehicles
  int vid
  string vin

tags
  int tid
  int cid
  string key

categories
  int cid
  string category

vehicleTags
  int vid
  int tid
  string value

Now all you need is a unique constraint on vehicleTags(vid, tid).

现在您只需要对vehicleTags(vid,tid)进行唯一约束。

Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?

或者,有一些方法可以创建超出简单外键的约束:根据您的数据库,您是否可以编写自定义约束或插入/更新触发器来强制执行车辆标记唯一性?

#5


0  

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

我需要解决这个确切的问题(相同的一般领域和一切 - 汽车零件)。我发现问题的最佳解决方案是使用Lucene / Xapian / Ferret / Sphinx或您喜欢的全文索引器。比SQL提供的性能要好得多。

These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.

现在,我几乎从未构建过一个不涉及全文索引器的数据库支持的Web应用程序。这个问题和搜索的一般问题经常出现,以便从工具箱中省略索引器。

#1


13  

This is yet another variation on the Entity-Attribute-Value design.

这是实体 - 属性 - 值设计的另一个变体。

A more recognizable EAV table looks like the following:

更易识别的EAV表如下所示:

CREATE TABLE vehicleEAV (
  vid        INTEGER,
  attr_name  VARCHAR(20),
  attr_value VARCHAR(100),
  PRIMARY KEY (vid, attr_name),
  FOREIGN KEY (vid) REFERENCES vehicles (vid)
);

Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.

有些人强制attr_name引用预定义属性名称的查找表,以限制混乱。

What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:

您所做的只是将EAV表分布在三个表上,但不改进元数据的顺序:

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,
  tid         INTEGER,
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid),
  FOREIGN KEY (tid) REFERENCES tags(tid)
);

CREATE TABLE categories (
  cid        INTEGER PRIMARY KEY,
  category   VARCHAR(20) -- "attr_name"
);

CREATE TABLE tags (
  tid        INTEGER PRIMARY KEY,
  tag        VARCHAR(100) -- "attr_value"
);

If you're going to use the EAV design, you only need the vehicleTags and categories tables.

如果您打算使用EAV设计,您只需要vehicleTags和类别表。

CREATE TABLE vehicleTag (
  vid         INTEGER,
  cid         INTEGER,     -- reference to "attr_name" lookup table
  tag         VARCHAR(100, -- "attr_value"
  PRIMARY KEY (vid, cid),
  FOREIGN KEY (vid) REFERENCES vehicles(vid),
  FOREIGN KEY (cid) REFERENCES categories(cid)
);

But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.

但请记住,您正在将数据与元数据混合在一起。您将无法将某些约束应用于数据模型。

  • How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
  • 如何使其中一个类别成为必需类(传统列使用NOT NULL约束)?
  • How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
  • 如何使用SQL数据类型验证某些标记值?你不能,因为你为每个标签值使用一个长字符串。对于您将来需要的每个标签,这个字符串是否足够长?你不能说。
  • How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
  • 如何将一些标记限制为一组允许值(传统表使用外键到查找表)?这是你的“软顶”与“软顶”的例子。但是您不能对标记列进行约束,因为该约束将适用于其他类别的所有其他标记值。您可以有效地将发动机尺寸和油漆颜色限制为“软顶”。

SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.

SQL数据库不适用于此模型。要做到正确是非常困难的,并且查询它变得非常复杂。如果继续使用SQL,最好按常规方式对表进行建模,每个属性只有一列。如果您需要“子类型”,则为每个子类型定义一个从属表(Class-Table Inheritance),否则使用Single-Table Inheritance。如果每个实体的属性具有无限的变化,则使用序列化LOB。

Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is Sesame.

为这些流体非关系数据模型设计的另一种技术是语义数据库,它将数据存储在RDF中并使用SPARQL进行查询。一个免费的解决方案是芝麻。

#2


3  

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

我需要解决这个确切的问题(相同的一般领域和一切 - 汽车零件)。我发现问题的最佳解决方案是使用Lucene / Xapian / Ferret / Sphinx或您喜欢的全文索引器。比SQL提供的性能要好得多。

#3


0  

I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.

我认为您的解决方案是简单地将制造商列添加到您的车辆表中。这是一个你知道所有车辆都会拥有的属性(即汽车不会自动出现),并且通过在车辆表中将其作为一列,您可以解决每辆车只有一个制造商的问题。此方法适用于您知道将由所有车辆共享的任何属性。然后,您可以为非通用的其他属性实施标记系统。

So taking from your example the vehicle table would be something like:

所以从你的例子中得出车辆表是这样的:

vehicle
  vid
  vin
  make
  model

#4


0  

One way would be to slightly rethink your schema, normalising tag keys away from values:

一种方法是稍微重新考虑您的架构,将标记键从值中标准化:

vehicles
  int vid
  string vin

tags
  int tid
  int cid
  string key

categories
  int cid
  string category

vehicleTags
  int vid
  int tid
  string value

Now all you need is a unique constraint on vehicleTags(vid, tid).

现在您只需要对vehicleTags(vid,tid)进行唯一约束。

Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?

或者,有一些方法可以创建超出简单外键的约束:根据您的数据库,您是否可以编写自定义约束或插入/更新触发器来强制执行车辆标记唯一性?

#5


0  

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

我需要解决这个确切的问题(相同的一般领域和一切 - 汽车零件)。我发现问题的最佳解决方案是使用Lucene / Xapian / Ferret / Sphinx或您喜欢的全文索引器。比SQL提供的性能要好得多。

These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.

现在,我几乎从未构建过一个不涉及全文索引器的数据库支持的Web应用程序。这个问题和搜索的一般问题经常出现,以便从工具箱中省略索引器。