数据库设计:什么时候应该创建一个超类的公共属性?

时间:2022-10-03 15:39:20

To describe my dilemma, let me first start with an example problem (stolen from here). Let's say you have a GradStudent table in your database that looks like this:

为了描述我的困境,让我先从一个示例问题开始(从这里偷来)。假设您的数据库中有一个GradStudent表,如下所示:

GradStudent:
firstName
lastName
birthDate
courseAssignment
researchGrant

But only teaching assistants will have a course assignment and only research assistants will have a researchGrant, so one of those two will always be null. Obviously this is not optimal and it would be better to do this:

但只有助教才能完成课程作业,只有研究助理会有一个研究助手,所以这两个中的一个将永远是空的。显然这不是最佳的,最好这样做:

GradStudent:
firstName
lastName
birthDate

TeachAsst:
courseAssignment

ResearchAsst:
researchGrant

Where TeachAsst and ResearchAsst have a foreign key (probably a "studentID" surrogate) from the GradStudent table.

TeachAsst和ResearchAsst在GradStudent表中有外键(可能是“studentID”代理)。

I also understand why it wouldn't be best to make two completely separate tables like:

我也理解为什么最好不要制作两个完全独立的表:

TeachAsst:
firstName
lastName
birthDate
courseAssignment

ResearchAsst:
firstName
lastName
birthDate
researchGrant

Because you're repeating a lot of attributes that have the same meaning.

因为你重复了很多具有相同含义的属性。

However, two distinct classes would make sense (I think) if they had hardly any fields in common, like:

但是,如果它们几乎没有任何共同的字段,那么两个不同的类是有意义的(我认为),例如:

TeachAsst:
name
courseAssignment
payRate
numStudents

ResearchAsst:
name
researchGrant
facultyAdvisor
researchTopic

Here, they only have "name" in common, so would it be silly to have a GradStudent superclass with only a single attribute of "name?" Where is the tipping point? How do you decide when to have a superclass of common information, or when to leave two classes completely separate? Having a superclass makes most of CRUD a bit harder because to create or update a TeachAsst you need to change two tables instead of just one.

在这里,他们只有“名字”的共同点,所以拥有一个只有“name?”的单一属性的GradStudent超类是愚蠢的吗?引爆点在哪里?您如何决定何时拥有超类的常用信息,或何时将两个类完全分开?拥有一个超类使得大部分CRUD变得更加困难,因为要创建或更新TeachAsst,您需要更改两个表而不是一个表。

As another example, let's say the DB you're working on involves measuring information on different electronic devices. And while a camera and a mobile phone have length/width/height in common, most of the other measurements will not coincide (e.g. the camera won't have any audio information, and the mobile phone won't have any lens or viewport measurements). So it seems almost simpler to have a cameraData table and a mobileData that are completely separate, rather than put their little amount of common information into a superclass table. What do you think? Is there a general rule that says you should always put common data together in a superclass, even if it's a small percentage of the subclass's descriptive data?

再举一个例子,假设您正在处理的数据库涉及测量不同电子设备上的信息。虽然相机和手机的长度/宽度/高度相同,但其他大多数测量都不会重合(例如相机不会有任何音频信息,手机也不会有任何镜头或视口测量)。因此,将cameraData表和mobileData完全分开似乎更简单,而不是将少量公共信息放入超类表中。你怎么看?是否有一般规则说你应该总是将公共数据放在一个超类中,即使它只是子类描述数据的一小部分?

Edit: Let's assume that in the grad student example, a grad student is either a teaching assistant or a research assistant, will never switch roles, and also is never both or neither.

编辑:让我们假设在研究生的例子中,研究生既可以是教学助理,也可以是研究助理,永远不会转换角色,也不会是两者兼而有之。

2 个解决方案

#1


I consider myself relatively new to database design, so take this for what it's worth. In the first example, my first thought would be to indeed maintain a separate "GradStudent" table which would include name and other personal information. In my opinion, it leaves you flexible for potential changes in the future. For example, what if another GradStudent role is created which can be held by an individual in addition to either TeachAsst or ResearchAsst? You could create a "GradStudent_Relationship" table that could accommodate additional roles in the future such that:

我认为自己相对较新的数据库设计,所以把它当作它的价值。在第一个例子中,我的第一个想法是确实维护一个单独的“GradStudent”表,其中包括姓名和其他个人信息。在我看来,它可以让您灵活应对未来的潜在变化。例如,如果创建了另一个GradStudent角色,除了TeachAsst或ResearchAsst之外,还可以由个人持有该角色怎么办?您可以创建一个“GradStudent_Relationship”表,以便将来可以容纳其他角色,以便:

GradStudent_Relationship:
GradStudent_ID (fk)
ResearchAsst_ID (fk)
TeachAsst_ID (fk)
NewGradStudentRole_ID (fk)

As for making your CRUD operations tougher, in my opinion the added flexibility outweighs that concern. Perhaps you could set up triggers within your database to help with that?

至于让你的CRUD操作更加困难,我认为增加的灵活性超过了这个问题。也许您可以在数据库中设置触发器来帮助解决这个问题?

Regarding the second example, why can't a camera have audio? Don't some digital cameras record video that includes audio? Also, why can't a mobile phone have a lens or viewport measurement? Don't many mobile phones now include cameras?

关于第二个例子,为什么相机不能有音频?有些数码相机不记录包含音频的视频吗?另外,为什么手机无法进行镜头或视口测量呢?现在不是很多手机都包含相机吗?

For what it's worth, I sometimes find it helpful to abstract the "classes" as best I can in order to maintain the most flexibility down the line. There probably is some trade off there in terms of CRUD operations as you mention, but personally, I like knowing the database schema can handle potential changes in the future.

对于它的价值,我有时会发现尽可能地抽象“类”以保持最大的灵活性是有帮助的。就像你提到的那样,在CRUD操作方面可能存在一些折衷,但就个人而言,我喜欢知道数据库模式可以处理未来的潜在变化。

I hope this was at least somewhat helpful.

我希望这至少有点帮助。

#2


In the GradStudent scenario you have the following property:

在GradStudent场景中,您具有以下属性:

A GradStudent can be TeachAsst first and become ResearchAsst later. Or she can be both at the same time.

GradStudent可以先成为TeachAsst,然后再成为ResearchAsst。或者她可以在同一时间。

In this situation, denormalization might not be a good idea.

在这种情况下,非规范化可能不是一个好主意。

Yet in your case, you measure cameas and mobile phones. They will never become something else. I think you could risk the denormalization for the sake of less complexity.

但在你的情况下,你测量摄像头和手机。他们永远不会成为别的东西。我认为为了降低复杂性,你可能会冒着非规范化的风险。

Or, you could even think about using a documend db like CouchDB, in which you do not have to follow any schema.

或者,您甚至可以考虑使用像CouchDB这样的Documend数据库,您不必遵循任何模式。

#1


I consider myself relatively new to database design, so take this for what it's worth. In the first example, my first thought would be to indeed maintain a separate "GradStudent" table which would include name and other personal information. In my opinion, it leaves you flexible for potential changes in the future. For example, what if another GradStudent role is created which can be held by an individual in addition to either TeachAsst or ResearchAsst? You could create a "GradStudent_Relationship" table that could accommodate additional roles in the future such that:

我认为自己相对较新的数据库设计,所以把它当作它的价值。在第一个例子中,我的第一个想法是确实维护一个单独的“GradStudent”表,其中包括姓名和其他个人信息。在我看来,它可以让您灵活应对未来的潜在变化。例如,如果创建了另一个GradStudent角色,除了TeachAsst或ResearchAsst之外,还可以由个人持有该角色怎么办?您可以创建一个“GradStudent_Relationship”表,以便将来可以容纳其他角色,以便:

GradStudent_Relationship:
GradStudent_ID (fk)
ResearchAsst_ID (fk)
TeachAsst_ID (fk)
NewGradStudentRole_ID (fk)

As for making your CRUD operations tougher, in my opinion the added flexibility outweighs that concern. Perhaps you could set up triggers within your database to help with that?

至于让你的CRUD操作更加困难,我认为增加的灵活性超过了这个问题。也许您可以在数据库中设置触发器来帮助解决这个问题?

Regarding the second example, why can't a camera have audio? Don't some digital cameras record video that includes audio? Also, why can't a mobile phone have a lens or viewport measurement? Don't many mobile phones now include cameras?

关于第二个例子,为什么相机不能有音频?有些数码相机不记录包含音频的视频吗?另外,为什么手机无法进行镜头或视口测量呢?现在不是很多手机都包含相机吗?

For what it's worth, I sometimes find it helpful to abstract the "classes" as best I can in order to maintain the most flexibility down the line. There probably is some trade off there in terms of CRUD operations as you mention, but personally, I like knowing the database schema can handle potential changes in the future.

对于它的价值,我有时会发现尽可能地抽象“类”以保持最大的灵活性是有帮助的。就像你提到的那样,在CRUD操作方面可能存在一些折衷,但就个人而言,我喜欢知道数据库模式可以处理未来的潜在变化。

I hope this was at least somewhat helpful.

我希望这至少有点帮助。

#2


In the GradStudent scenario you have the following property:

在GradStudent场景中,您具有以下属性:

A GradStudent can be TeachAsst first and become ResearchAsst later. Or she can be both at the same time.

GradStudent可以先成为TeachAsst,然后再成为ResearchAsst。或者她可以在同一时间。

In this situation, denormalization might not be a good idea.

在这种情况下,非规范化可能不是一个好主意。

Yet in your case, you measure cameas and mobile phones. They will never become something else. I think you could risk the denormalization for the sake of less complexity.

但在你的情况下,你测量摄像头和手机。他们永远不会成为别的东西。我认为为了降低复杂性,你可能会冒着非规范化的风险。

Or, you could even think about using a documend db like CouchDB, in which you do not have to follow any schema.

或者,您甚至可以考虑使用像CouchDB这样的Documend数据库,您不必遵循任何模式。