数据库设计中的共享目的表（如何*实现问答设施）

I have long considered the design of a database that involves shared table purposes to be somewhat a trait of smelly code, and progressively increasing proliferation of smelly-code related problems.

我一直认为,涉及共享表目的数据库的设计有点像臭代码的特征,并且逐渐增加了与臭代码相关的问题。

By this I mean, people over-normalizing, using 1 table where 2 tables could be more logical, people who've just discovered what normalizing is and overuse it to the point where they virtually layer a database on a database, or trying to use hierarchical data naively to solve all their problems.

我的意思是,人们过度规范化,使用1个表,其中2个表可能更符合逻辑,人们刚刚发现了规范化并且过度使用它到他们虚拟地将数据库分层到数据库上,或者试图使用天真地分层数据来解决他们所有的问题。

The question is, when you have 2 sets of data, which appear to be the same, but they have a different purpose, do you use the same table to represent it? When do you know when it is a good idea and a bad idea to use this?

问题是,当你有2组数据看起来是相同的,但它们有不同的用途时,你是否使用相同的表来表示它?你什么时候知道什么时候使用它是一个好主意和坏主意?

I've always been of the mind that needless self-referencing tables or structures that involved any table being used twice in the same query was a very dangerous predicament, both in design, and in long term performance and ease of future improvement.

我一直认为,在同一查询中使用两次表格的不必要的自引用表或结构是一个非常危险的困境,无论是在设计方面,还是在长期性能和未来改进的简易性方面。

That was of course, until I saw a thing or two in the RSS feeds for SO here. Now I'm not going onto a meta discussion about how SO works, but there appears to be an implicit design consideration here that challenged my thinking and want to glean a more cohesive answer on that style of logic.

那当然是,直到我在这里看到RSS提要中的一两件事。现在我不会就SO如何运作进行元讨论,但这里似乎有一个隐含的设计考虑因素挑战了我的想法,并希望在这种逻辑风格上收集更具凝聚力的答案。

You'll notice the generic question has the format:

您会注意到通用问题的格式如下:

 /question/1234/stuff-here-that-s-safe-to-leave-out

And I generally assumed that this implied that questions were numerically ordered somewhere.

我一般认为这意味着问题在某处被数字排序。

But here is what stumped me: I'll take for example question 316210. If you look at the rss feed for this question, you will note there is a entry for the question, and a series of answer entries, which are functionally identical for the question entry except for a few minor differences. Now note in the answer entries also have link references, ... to questions, not the same question however, but to different questions, such as question 316218, which , when visited, redirects you back to the original question.

但这里有什么难过我:我将以问题316210为例。如果你看一下这个问题的RSS提要,你会注意到这个问题有一个条目,还有一系列答案条目,这些条目在功能上是相同的。问题条目除了一些细微差别。现在注释在答案条目中也有链接引用,...到问题,但不是同一个问题,但是对于不同的问题,例如问题316218,当访问时,将您重定向回原始问题。

Now I'm not interested in how they implement that in the code, the problem is that you have here, questions and answers appear to be sharing the same table ( hence the sequential question ID's ) , and when users refer to an answer ID, you have to first query the database, and then go "hey!, oops!, that's not a question!" and then proceed to do a second query to find out the parent of that question ( in the same table ) and then redirect you to the actual question page, not to mention all the hullabaloo required with self joining queries ( which I've always considered filthy ) and conditionals all over the place to tune behavior.

现在我对他们如何在代码中实现它感兴趣,问题是你在这里,问题和答案似乎是共享同一个表(因此顺序问题ID),当用户引用答案ID时,你必须首先查询数据库,然后去“嘿!,哎呀!,这不是一个问题!”然后继续进行第二次查询以查找该问题的父级(在同一个表中),然后将您重定向到实际的问题页面,更不用说自我加入查询所需的所有喧嚣(我一直认为肮脏的)和条件在整个地方调整行为。

Less Digressing, the real problem

The problem is, here you have 2 sets of data sharing the same table, and sure, this data is superficially similar, for now at least, but there looks like there is just so much technical debt involved.

问题是,在这里你有两组数据共享同一个表,当然,至少现在这个数据在表面上是相似的,但看起来有很多技术债务涉及。

The long term considerations involved with implementing new features that can apply to questions and not answers and vice versa, not to mention avoiding one being interpreted as another in some obscure corner. You can't add a new column for use in one application set without having to consider the resulting effects in another.

长期考虑涉及实现可以应用于问题而不是答案的新功能,反之亦然,更不用说避免在某个不起眼的角落中将其解释为另一个。您无法在一个应用程序集中添加新列,而无需在另一个应用程序集中考虑结果效果。

Sure, there is a minor benefit from using the singular table and that's when you are making a feature that is shared between facets you only have to code it once, but this could be just as easily represented by using an ancestor class of common methods, and child classes that bind to the specific tables for the difference cases. So at least that way, adding a new feature has no follow-on implications for the other scenario.

当然,使用单数表有一个小小的好处,那就是当你创建一个在facets之间共享的特性时,你只需要编码一次,但这可以通过使用祖先类的常用方法来表示,绑定到不同情况的特定表的子类和子类。因此,至少在这种情况下,添加新功能对其他方案没有后续影响。

Now I've encountered this sort of problem in many places before, sure, but SO is the most easy example to point out.

现在我在很多地方遇到过这种问题,当然,但是SO是最容易指出的例子。

When you implement your databases like this, do you share the table, or do you fork?

当您像这样实现数据库时,您是否共享该表,还是分叉?

When, and why?

什么时候,为什么?

3 个解决方案

#1

I think that from the perspective of SO, both questions and responses are the same thing -- user posts. They just happen to be related. If a post has no parent, then it's a question. If a post does have a parent, then it's an answer. I find this perfectly reasonable though I'm not sure I would make the same choice since there are some significant differences. Perhaps these are stored in separate tables though.

我认为从SO的角度来看,问题和回答都是一样的 - 用户帖子。他们碰巧是相关的。如果帖子没有父母,那么这是一个问题。如果帖子确实有父母,那么这就是答案。我发现这完全合理,虽然我不确定我会做出同样的选择,因为存在一些显着的差异。也许这些存储在单独的表中。

I've basically done the same thing in one of my applications. I track events. An event is a "Master" event if it has no parent. If it has a parent, then it has to be a subevent of a "Master" event. They share many of the same base properties and so they share a table. "Master" events have some additional properties that are stored in a separate table. Generally when I'm selecting subevents, I already know the "Master" event and so a separate query is not needed.

我基本上在我的一个应用程序中做了同样的事情。我跟踪事件。如果事件没有父级,则事件是“主”事件。如果它有父,那么它必须是“主”事件的子事件。它们共享许多相同的基本属性,因此它们共享一个表。 “主”事件具有一些存储在单独表中的附加属性。通常,当我选择子事件时,我已经知道“主”事件,因此不需要单独的查询。

#2

We have a similar thing with our student management system which stores applications and enrollments on the same table. An application is in essence a special case of enrollment on a course sort of not active yet. In my case it's probably best to use it this way as an application is easy to upgrade to an enrollment.

我们的学生管理系统也有类似的东西,它将应用程序和注册存储在同一个表中。一个应用程序本质上是一个特殊的案例,即在一个尚未激活的课程上注册。在我的情况下,最好以这种方式使用它,因为应用程序很容易升级到注册。

Really it depends how you are going to end up querying the data, I suspect a Q and A system is going to have a search facility so this is going to want to search questions and answers so it makes sense to have them in the same table. Also the questions and answers likely require the same extra data such as a date modified, who sent it... Just make sure you have a field in the index of the table defining if it is a question or an answer if you go this route.

真的,这取决于你最终将如何查询数据,我怀疑Q和A系统将有一个搜索工具,所以这将是想要搜索问题和答案所以将它们放在同一个表中是有意义的。此外,问题和答案可能需要相同的额外数据,例如修改日期,发送者...只要确保在表的索引中有一个字段,如果你走这条路线则定义它是一个问题还是一个答案。

It is worth forking when you are having to work around your solution, but remember just because you initially have it as one table it is relatively easy to run a script through separating it off later.

当您不得不解决您的解决方案时,这是值得的,但请记住,因为您最初将它作为一个表,因此通过稍后将其分离来运行脚本相对容易。

#3

First,

people over-normalizing, using 1 table where 2 tables could be more logical

人们过度规范化,使用1个表,其中2个表可能更合乎逻辑

I find that denormalized data tends to be in fewer tables, not more.

我发现非规范化数据倾向于更少的表,而不是更多。

As to "re-using" tables for more than one purpose: JUST SAY NO! It's incurring technical dept from day 1.

The point of relational databases is that the schema implies the purpose of the data: what other data it relates to, its constraints, etc. It makes no more sense to wedge two similar data structures into one table than making an OOP ancestor class or interface a superset of classes for two similar value objects. It's the opposite of using the tool well.

关系数据库的关键在于模式意味着数据的目的:它与其相关的其他数据,它的约束等。将两个相似的数据结构楔入一个表而不是制作OOP祖先类或接口没有意义。两个类似值对象的类的超集。这与使用该工具相反。

Combining them insures that you will create 1-1 tables for the unique columns that each entity needs that you cannot force into the common table. You also cannot use R.I., UNIQUE indexes, or other constraints because one of the entities will always break it. Every consumer of the common tables needs to know how to tell the difference between rows belonging to one entity or the other; no one can ever make assumptions about anything based on the schema.

将它们组合在一起可确保您为每个实体所需的唯一列创建1-1表,而不能强制进入公共表。您也不能使用R.I.,UNIQUE索引或其他约束,因为其中一个实体将始终破坏它。公共表的每个消费者都需要知道如何区分属于一个实体或另一个实体的行;任何人都无法根据模式对任何事情做出假设。

It's easy enough to UNION two tables together if need be, and that query can even adjust for changes to their "common" schema.

如果需要,将UNION两个表放在一起很容易,并且该查询甚至可以调整其“常用”模式的更改。

#1

#2