你如何处理数据库中的多态？

Example

I have Person, SpecialPerson, and User. Person and SpecialPerson are just people - they don't have a user name or password on a site, but they are stored in a database for record keeping. User has all of the same data as Person and potentially SpecialPerson, along with a user name and password as they are registered with the site.

我有Person,SpecialPerson和User。 Person和SpecialPerson只是人 - 他们在站点上没有用户名或密码,但是它们存储在数据库中以保存记录。用户拥有与Person和可能的SpecialPerson相同的所有数据,以及在站点中注册的用户名和密码。

How would you address this problem? Would you have a Person table which stores all data common to a person and use a key to look up their data in SpecialPerson (if they are a special person) and User (if they are a user) and vice-versa?

你会如何解决这个问题?你有一个Person表,它存储一个人共有的所有数据,并使用一个键在SpecialPerson中查找他们的数据(如果他们是特殊的人)和User(如果他们是用户),反之亦然?

13 个解决方案

#1

There are generally three ways of mapping object inheritance to database tables.

通常有三种将对象继承映射到数据库表的方法。

You can make one big table with all the fields from all the objects with a special field for the type. This is fast but wastes space, although modern databases save space by not storing empty fields. And if you're only looking for all users in the table, with every type of person in it things can get slow. Not all or-mappers support this.

您可以使用所有对象的所有字段创建一个大表,并为该类型指定一个特殊字段。尽管现代数据库通过不存储空字段来节省空间,但速度很快但浪费了空间。如果你只是在寻找表中的所有用户,那么每个类型的人都会变得很慢。并非所有or-mappers都支持此功能。

You can make different tables for all the different child classes with all of the tables containing the base-class fields. This is ok from a performance perspective. But not from a maintenance perspective. Every time your base-class changes all the tables change.

您可以为所有不同的子类创建不同的表,其中所有表都包含基类字段。从性能角度来看,这是可以的。但不是从维护的角度来看。每次基类更改时,所有表都会更改。

You can also make a table per class like you suggested. This way you need joins to get all the data. So it's less performant. I think it's the cleanest solution.

您也可以像建议的那样为每个类创建一个表。这样您就需要连接来获取所有数据。所以性能较差。我认为这是最干净的解决方案。

What you want to use depends of course on your situation. None of the solutions is perfect so you have to weigh the pros and cons.

您想要使用的内容当然取决于您的情况。没有一个解决方案是完美的,所以你必须权衡利弊。

#2

Take a look at Martin Fowler's Patterns of Enterprise Application Architecture:

看看Martin Fowler的企业应用程序架构模式:

Single Table Inheritance:

单表继承:

When mapping to a relational database, we try to minimize the joins that can quickly mount up when processing an inheritance structure in multiple tables. Single Table Inheritance maps all fields of all classes of an inheritance structure into a single table.

映射到关系数据库时,我们尝试最小化在多个表中处理继承结构时可以快速挂载的连接。单表继承将继承结构的所有类的所有字段映射到单个表中。
Class Table Inheritance:

类表继承:

You want database structures that map clearly to the objects and allow links anywhere in the inheritance structure. Class Table Inheritance supports this by using one database table per class in the inheritance structure.

您希望数据库结构能够清晰地映射到对象,并允许在继承结构中的任何位置链接。类表继承通过在继承结构中为每个类使用一个数据库表来支持此功能。
Concrete Table Inheritance:

具体表继承:

Thinking of tables from an object instance point of view, a sensible route is to take each object in memory and map it to a single database row. This implies Concrete Table Inheritance, where there's a table for each concrete class in the inheritance hierarchy.

从对象实例的角度考虑表,一个明智的路径是将每个对象放在内存中并将其映射到单个数据库行。这意味着具体表继承,其中有一个表用于继承层次结构中的每个具体类。

#3

If the User, Person and Special person all have the same foreign keys, then I would have a single table. Add a column called Type which is constrained to be User, Person or Special Person. Then based on the value of Type have constraints on the other optional columns.

如果用户,人和特殊人都有相同的外键,那么我会有一个表。添加一个名为Type的列,该列被限制为User,Person或Special Person。然后基于Type的值对其他可选列有约束。

For the object code it doesn't make much difference if you have the separate tables or multiple tables to represent polymorphism. However if you have to do SQL against the database, its much easier if the polymorphism is captured in single table...provided the foreign keys for the sub types are the same.

对于目标代码,如果您有单独的表或多个表来表示多态,则它没有太大区别。但是,如果必须对数据库执行SQL,如果在单个表中捕获多态,则更容易...如果子类型的外键是相同的。

#4

What I'm going to say here is going to send database architects into conniptions but here goes:

我将在这里说的是将数据库架构师发送到conniptions,但是这里是:

Consider a database view as the equivalent of an interface definition. And a table is the equivalent of a class.

将数据库视图视为接口定义的等效物。表格相当于一个类。

So in your example, all 3 person classes will implement the IPerson interface. So you have 3 tables - one for each of 'User', 'Person' and 'SpecialPerson'.

所以在你的例子中,所有3个人类都将实现IPerson接口。所以你有3个表 - 一个用于'User','Person'和'SpecialPerson'。

Then have a view 'PersonView' or whatever that selects the common properties (as defined by your 'interface') from all 3 tables into the single view. Use a 'PersonType' column in this view to store the actual type of the person being stored.

然后有一个视图'PersonView'或任何从所有3个表中选择公共属性(由'interface'定义)到单个视图中的视图。在此视图中使用“PersonType”列来存储所存储人员的实际类型。

So when you're running a query that can be operated on any type of person, just query the PersonView view.

因此,当您运行可在任何类型的人上操作的查询时,只需查询PersonView视图即可。

#5

This might not be what the OP meant to ask, but I thought I might throw this in here.

这可能不是OP想要提出的问题,但我想我可能会把它放在这里。

I recently had a unique case of db polymorphism in a project. We had between 60 to 120 possible classes, each with its own set of 30 to 40 unique attributes, and about 10 - 12 common attributes on all the classes . We decided to go the SQL-XML route and ended up with a single table. Something like :

我最近在项目中有一个独特的db多态性案例。我们有60到120个可能的类,每个类都有自己的30到40个唯一属性集,以及所有类中大约10-12个公共属性。我们决定采用SQL-XML路由,最后得到一个表。就像是 :

PERSON (personid,persontype, name,address, phone, XMLOtherProperties)

containing all common properties as columns and then a big XML property bag. The ORM layer was then responsible for reading/writing the respective properties from the XMLOtherProperties. A bit like :

包含所有常见属性作为列,然后包含一个大的XML属性包。然后,ORM层负责从XMLOtherProperties读取/写入相应的属性。有一点像 :

 public string StrangeProperty
{
get { return XMLPropertyBag["StrangeProperty"];}
set { XMLPropertyBag["StrangeProperty"]= value;}
}

(we ended up mapping the xml column as a Hastable rather than a XML doc, but you can use whatever suits your DAL best)

(我们最终将xml列映射为Hastable而不是XML文档,但您可以使用最适合您的DAL的任何内容)

It's not going to win any design awards, but it will work if you have a large (or unknown) number of possible classes. And in SQL2005 you can still use XPATH in your SQL queries to select rows based on some property that is stored as XML.. it's just a small performance penalty to take in.

它不会赢得任何设计奖项,但如果您有大量(或未知)数量的可能类别,它将起作用。在SQL2005中,您仍然可以在SQL查询中使用XPATH来根据存储为XML的某些属性来选择行。这只是一个很小的性能损失。

#6

There's three basic strategies for handling inheritance in a relational database, and a number of more complex/bespoke alternatives depending on your exact needs.

处理关系数据库中的继承有三种基本策略,以及一些更复杂/定制的替代方案,具体取决于您的确切需求。

Table per class hierarchy. One table for the whole hierarchy.

每个类层次结构的表。整个层次结构的一个表。

Table per subclass. A separate table is created for every sub class with a 0-1 association between the subclassed tables.

每个子类的表。为每个子类创建一个单独的表,子类表之间有0-1关联。

Table per concrete class. A single table is created for every concrete class.

每个具体类的表。为每个具体类创建一个表。

Each of these appoaches raises its own issues about normalization, data access code, and data storage, although my personal preferance is to use table per subclass unless there's a specific performance or structural reason to go with one of the the alternatives.

这些appoaches中的每一个都提出了自己关于规范化,数据访问代码和数据存储的问题,尽管我个人优先考虑使用每个子类的表,除非有特定的性能或结构原因与其中一个替代方案一起使用。

#7

At the risk of being an 'architecture astronaut' here, I would be more inclined to go with separate tables for the subclasses. Have the primary key of the subclass tables also be a foreign key linking back to the supertype.

冒着成为“架构宇航员”的风险,我更倾向于为子类使用单独的表。使子类表的主键也是链接回超类型的外键。

The main reason for doing it this way is that it then becomes much more logically consistent and you do not end up with a lot of fields that are NULL and nonsensical for that particular record. This method also makes it much more easy to add extra fields to the subtypes as you iterate your design process.

这样做的主要原因是它在逻辑上变得更加一致,并且你不会得到很多字段,这些字段对于那个特定的记录来说是NULL和无意义的。在迭代设计过程时,此方法还可以更轻松地向子类型添加额外字段。

This does add the downside of adding JOINs to your queries, which can impact performance, but I almost always go with an ideal design first, and then look to optimise later if it proves to be necessary. The few times I have gone the 'optimal' way first I have almost always regretted it later.

这确实增加了向查询添加JOIN的缺点,这可能会影响性能,但我几乎总是首先使用理想的设计,然后在证明有必要的情况下进行优化。我几次首先采用“最佳”方式,后来几乎总是后悔。

So my design would be something like

所以我的设计就像是

PERSON (personid, name, address, phone, ...)

PERSON(人物,姓名,地址,电话......)

SPECIALPERSON (personid REFERENCES PERSON(personid), extra fields...)

特约人(人格参考人(人),额外领域......)

USER (personid REFERENCES PERSON(personid), username, encryptedpassword, extra fields...)

USER(personid REFERENCES PERSON(personid),用户名,encryptedpassword,额外字段......)

You could also create VIEWs later on that aggregates the supertype and the subtype, if that is necessary.

您也可以稍后创建VIEW,聚合超类型和子类型(如果有必要)。

The one flaw in this approach is if you find yourself heavily searching for the subtypes associated with a particulare supertype. There is no easy answer to this off the top of my head, you could track it programmatically if necessary, or else run soem global queries and cache the results. It will really depend on the application.

这种方法的一个缺陷是,如果你发现自己大量搜索与特定超类型相关的子类型。对于这个问题没有简单的答案,您可以在必要时以编程方式跟踪它,或者运行soem全局查询并缓存结果。这将取决于应用程序。

#8

I'd say that, depending on what differentiates Person and Special Person, you probably don't want polymorphism for this task.

我会说,根据人与特殊人的区别,你可能不希望这个任务有多态性。

I'd create a User table, a Person table that has a nullable foreign key field to User (i.e, the Person can be a User, but does not have to).
Then I would make a SpecialPerson table which relates to the Person table with any extra fields in it. If a record is present in SpecialPerson for a given Person.ID, he/she/it is a special person.

我创建了一个User表,一个Person表,它对用户有一个可以为空的外键字段(即Person可以是User,但不是必须的)。然后我会创建一个与Person表相关的SpecialPerson表,其中包含任何额外的字段。如果SpecialPerson中存在特定Person.ID的记录,则他/她/它是一个特殊的人。

#9

In our company we deal with polymorphism by combining all the fields in one table and its worst and no referential integrity can be enforced and very difficult to understand model. I would recommend against that approach for sure.

在我们公司,我们通过组合一个表中的所有字段来处理多态性,并且它最差,并且不能强制执行参照完整性并且非常难以理解模型。我肯定会反对这种方法。

I would go with Table per subclass and also avoid performance hit but using ORM where we can avoid joining with all subclass tables by building query on the fly by basing on type. The aforementioned strategy works for single record level pull but for bulk update or select you can't avoid it.

我会使用每个子类的Table并且也避免性能损失,但是使用ORM,我们可以通过基于类型动态构建查询来避免加入所有子类表。上述策略适用于单个记录级别的拉动,但对于批量更新或选择,您无法避免它。

#10

yes, I would also consider a TypeID along with a PersonType table if it is possible there will be more types. However, if there is only 3 that shouldn't be nec.

是的,如果有可能会有更多类型,我还会考虑一个TypeID和一个PersonType表。但是,如果只有3个不应该是nec。

#11

This is an older post but I thought I'll weigh in from a conceptual, procedural and performance standpoint.

这是一篇较老的帖子,但我认为从概念,程序和表现的角度来看,我都会受到影响。

The first question I would ask is the relationship between person, specialperson, and user, and whether it's possible for someone to be both a specialperson and a user simultaneously. Or, any other of 4 possible combinations (class a + b, class b + c, class a + c, or a + b + c). If this class is stored as a value in a type field and would therefore collapse these combinations, and that collapse is unacceptable, then I would think a secondary table would be required allowing for a one-to-many relationship. I've learned you don't judge that until you evaluate the usage and the cost of losing your combination information.

我要问的第一个问题是人,专家和用户之间的关系,以及某人是否可能同时成为特殊人和用户。或者,4种可能的组合中的任何其他组合(a + b类,b + c类,a + c类或a + b + c类)。如果这个类作为一个值存储在一个类型字段中,因此会崩溃这些组合,并且崩溃是不可接受的,那么我认为需要一个二级表来允许一对多的关系。我知道在评估丢失组合信息的使用和成本之前,您不会对此进行评判。

The other factor that makes me lean toward a single table is your description of the scenario. User is the only entity with a username (say varchar(30)) and password (say varchar(32)). If the common fields' possible length is an average 20 characters per 20 fields, then your column size increase is 62 over 400, or about 15% - 10 years ago this would have been more costly than it is with modern RDBMS systems, especially with a field type like varchar (e.g. for MySQL) available.

使我倾向于单个表的另一个因素是您对场景的描述。用户是唯一具有用户名(例如varchar(30))和密码(例如varchar(32))的实体。如果公共字段的可能长度是每20个字段平均20个字符,那么您的列大小增加62,超过400,或大约15% - 10年前,这将比现代RDBMS系统更昂贵,尤其是可用的varchar字段类型(例如MySQL)。

And, if security is of concern to you, it might be advantageous to have a secondary one-to-one table called credentials ( user_id, username, password). This table would be invoked in a JOIN contextually at say time of login, but structurally separate from just "anyone" in the main table. And, a LEFT JOIN is available for queries that might want to consider "registered users".

而且,如果您担心安全性,那么拥有一个名为凭证(user_id,用户名,密码)的二级一对一表可能更有利。此表将在登录时在上下文中以JOIN方式调用,但在结构上与主表中的“任何人”分开。并且,LEFT JOIN可用于可能要考虑“注册用户”的查询。

My main consideration for years is still to consider the object's significance (and therefore possible evolution) outside the DB and in the real world. In this case, all types of persons have beating hearts (I hope), and may also have hierarchical relationships to one another; so, in the back of my mind, even if not now, we may need to store such relationships by another method. That's not explicitly related to your question here, but it is another example of the expression of an object's relationship. And by now (7 years later) you should have good insight into how your decision worked anyway :)

多年来我的主要考虑仍然是考虑对象在数据库之外和现实世界中的重要性(因此可能是进化)。在这种情况下,所有类型的人都有跳动的心(我希望),也可能有彼此的等级关系;所以,在我的脑海里,即使不是现在,我们可能需要通过另一种方法来存储这种关系。这与你的问题没有明确的关系,但它是对象关系表达的另一个例子。到现在为止(7年后)你应该对你的决定如何运作有很好的了解:)

#12

In the past I've done it exactly as you suggest -- have a Person table for common stuff, then SpecialPerson linked for the derived class. However, I'm re-thinking that, as Linq2Sql wants to have a field in the same table indicate the difference. I haven't looked at the entity model too much, though -- pretty sure that allows the other method.

在过去,我完全按照你的建议完成它 - 有一个常用的Person表,然后是派生类的SpecialPerson。但是,我正在重新思考,因为Linq2Sql希望在同一个表中有一个字段表示区别。虽然我没有太多关注实体模型 - 非常确定允许其他方法。

#13

-1

Personally, I would store all of these different user classes in a single table. You can then either have a field which stores a 'Type' value, or you can imply what type of person you're dealing with by what fields are filled in. For example, if UserID is NULL, then this record isn't a User.

就个人而言,我会将所有这些不同的用户类存储在一个表中。然后,您可以拥有一个存储“类型”值的字段,或者您可以通过填写的字段来暗示您正在处理的人员类型。例如,如果UserID为NULL,则此记录不是用户。

You could link out to other tables using a one to one-or-none type of join, but then in every query you'll be adding extra joins.

您可以使用一对一或无类型的连接链接到其他表,但随后在每个查询中您将添加额外的连接。

The first method is also supported by LINQ-to-SQL if you decide to go down that route (they call it 'Table Per Hierarchy' or 'TPH').

如果你决定沿着那条路线走(它们称之为“每个层次结构表”或“TPH”),LINQ-to-SQL也支持第一种方法。

#1