Greetings stackers,
问候堆垛机,
I'm trying to come up with the best database schema for an application that lets users create surveys and present them to the public. There are a bunch of "standard" demographic fields that most surveys (but not all) will include, like First Name, Last Name, etc. And of course users can create an unlimited number of "custom" questions.
我正在尝试为应用程序提供最佳数据库模式,以便用户创建调查并将其呈现给公众。有大量“标准”人口统计字段,大多数调查(但不是全部)将包括,如名字,姓氏等。当然,用户可以创建无限数量的“自定义”问题。
The first thing I thought of is something like this:
我想到的第一件事是这样的:
Survey
ID
SurveyName
SurveyQuestions
SurveyID
Question
Responses
SurveyID
SubmitTime
ResponseAnswers
SurveyID
Question
Answer
But that's going to suck every time I want to query data out. And it seems dangerously close to Inner Platform Effect
但每次我想查询数据时,这都会很糟糕。它似乎危险地接近内部平台效应
An improvement would be to include as many fields as I can think of in advance in the responses table:
一个改进是在响应表中包含尽可能多的字段:
Responses
SurveyID
SubmitTime
FirstName
LastName
Birthdate
[...]
Then at least queries for data from these common columns is straightforward, and I can query, say, the average age of everyone who ever answered any survey where they gave their birthdate.
然后,至少对来自这些公共列的数据的查询是直截了当的,我可以查询,例如,每个回答任何调查的人的平均年龄,他们给出了他们的生日。
But it seems like this will complicate the code a bit. Now to see which questions are asked in a survey I have to check which common response fields are enabled (using, I guess, a bitfield in Survey) AND what's in the SurveyQuestions table. And I have to worry about special cases, like if someone tries to create a "custom" question that duplicates a "common" question in the Responses table.
但似乎这会使代码复杂化一些。现在,为了查看调查中询问的问题,我必须检查哪些常见响应字段已启用(使用,我猜测,调查中的位域)以及SurveyQuestions表中的内容。我不得不担心特殊情况,例如,如果有人试图创建一个“自定义”问题,该问题会复制“响应”表中的“常见”问题。
Is this the best I can do? Am I missing something?
这是我能做的最好的吗?我错过了什么吗?
4 个解决方案
#1
5
Your first schema is the better choice of the two. At this point, you shouldn't worry about performance problems. Worry about making a good, flexible, extensible design. There are all sorts of tricks you can do later to cache data and make queries faster. Using a less flexible database schema in order to solve a performance problem that may not even materialize is a bad decision.
您的第一个架构是两者中更好的选择。此时,您不必担心性能问题。担心制作一个好的,灵活的,可扩展的设计。以后可以使用各种技巧来缓存数据并加快查询速度。使用灵活性较低的数据库模式来解决可能无法实现的性能问题是一个糟糕的决定。
Besides, many (perhaps most) survey results are only viewed periodically and by a small number of people (event organizers, administrators, etc.), so you won't constantly be querying the database for all of the results. And even if you were, the performance will be fine. You would probably paginate the results somehow anyway.
此外,许多(可能是大多数)调查结果只能由少数人(活动组织者,管理员等)定期查看,因此您不会经常查询数据库中的所有结果。即使你是,表现也会很好。无论如何,你可能会以某种方式对结果进行分页。
The first schema is much more flexible. You can, by default, include questions like name and address, but for anonymous surveys, you could simply not create them. If the survey creator wants to only view everyone's answers to three questions out of five hundred, that's a really simple SQL query. You could set up a cascading delete to automatically deleting responses and questions when a survey is deleted. Generating statistics will be much easier with this schema too.
第一个模式更灵活。默认情况下,您可以包含姓名和地址等问题,但对于匿名调查,您可能根本无法创建它们。如果调查创建者只想查看每个人对五百个问题的答案,那就是一个非常简单的SQL查询。您可以设置级联删除,以便在删除调查时自动删除回复和问题。使用此架构也可以更轻松地生成统计信息。
Here is a slightly modified version of the schema you provided. I assume you can figure out what data types go where :-)
这是您提供的架构的略微修改版本。我假设您可以找出哪些数据类型去哪里:-)
surveys survey_id (index) title questions question_id (index, auto increment) survey_id (link to surveys->survey_id) question responses response_id (index, auto increment) survey_id (link to surveys->survey_id) submit_time answers answer_id (index, auto increment) question_id (link to questions-question_id) answer
#2
1
I would suggest you always take a normalized approach to your database schema and then later decided if you need to create a solution for performance reasons. Premature optimization can be dangerous. Premature database de-normalization can be disastrous!
我建议您始终对数据库模式采用规范化方法,然后决定是否需要根据性能原因创建解决方案。过早优化可能很危险。过早的数据库反规范化可能是灾难性的!
I would suggest that you stick with the original schema and later, if necessary, create a reporting table that is a de-normalized version of your normalized schema.
我建议您坚持使用原始模式,然后在必要时创建一个报告表,该表是规范化模式的非规范化版本。
#3
1
One change that may or may not help simplify things would be to not link the ResponseAnswers back to the SurveyID. Rather, create an ID per response and per question and let your ResponseAnswers table contain the fields ResponseID, QuestionID, Answer. Although this would require keeping unique Identifiers for each unit it would help keep things a little bit more normalized. The response answers do no need to associate with the survey they were answering just the specific question they are answering and the response information that they are associated.
可能会或可能不会简化事情的一个变化是不将ResponseAnswers链接回SurveyID。而是,为每个响应和每个问题创建一个ID,并让您的ResponseAnswers表包含ResponseID,QuestionID,Answer字段。虽然这需要为每个单元保留唯一的标识符,但这有助于使事情更加标准化。响应答案不需要与他们回答的调查相关联,只是回答他们正在回答的具体问题以及他们关联的响应信息。
#4
0
I created a customer surveys system at my previous job and came up with a schema very similar to what you have. It was used to send out surveys (on paper) and tabulate the responses.
我在之前的工作中创建了一个客户调查系统,并提出了一个与您的模式非常相似的模式。它被用来发送调查(在纸面上)并将回复列表。
A couple of minor differences:
一些细微差别:
-
Surveys were NOT anonymous, and this was made very clear in the printed forms. It also meant that the demographic data in your example was known in advance.
调查不是匿名的,这在印刷表格中非常清楚。这也意味着您的示例中的人口统计数据是事先知道的。
-
There was a pool of questions which were attached to the surveys, so one question could be used on multiple surveys and analyzed independently of the survey it appeared on.
调查附有一系列问题,因此可以在多个调查中使用一个问题,并独立于其出现的调查进行分析。
-
Handling different types of questions got interesting -- we had a 1-3 scale (e.g., Worse/Same/Better), 1-5 scale (Very Bad, Bad, OK, Good, Very Good), Yes/No, and Comments.
处理不同类型的问题很有趣 - 我们有1-3级(例如,更糟/相同/更好),1-5级(非常糟糕,差,好,好,非常好),是/否和评论。
There was special code to handle the comments, but the other question types were handled generically by having a table of question types and another table of valid answers for each type.
有一些特殊的代码来处理注释,但其他问题类型通常由一个问题类型表和另一个每种类型的有效答案表来处理。
To make querying easier you could probably create a function to return the response based on a survey ID and question ID.
为了使查询更容易,您可以创建一个函数来根据调查ID和问题ID返回响应。
#1
5
Your first schema is the better choice of the two. At this point, you shouldn't worry about performance problems. Worry about making a good, flexible, extensible design. There are all sorts of tricks you can do later to cache data and make queries faster. Using a less flexible database schema in order to solve a performance problem that may not even materialize is a bad decision.
您的第一个架构是两者中更好的选择。此时,您不必担心性能问题。担心制作一个好的,灵活的,可扩展的设计。以后可以使用各种技巧来缓存数据并加快查询速度。使用灵活性较低的数据库模式来解决可能无法实现的性能问题是一个糟糕的决定。
Besides, many (perhaps most) survey results are only viewed periodically and by a small number of people (event organizers, administrators, etc.), so you won't constantly be querying the database for all of the results. And even if you were, the performance will be fine. You would probably paginate the results somehow anyway.
此外,许多(可能是大多数)调查结果只能由少数人(活动组织者,管理员等)定期查看,因此您不会经常查询数据库中的所有结果。即使你是,表现也会很好。无论如何,你可能会以某种方式对结果进行分页。
The first schema is much more flexible. You can, by default, include questions like name and address, but for anonymous surveys, you could simply not create them. If the survey creator wants to only view everyone's answers to three questions out of five hundred, that's a really simple SQL query. You could set up a cascading delete to automatically deleting responses and questions when a survey is deleted. Generating statistics will be much easier with this schema too.
第一个模式更灵活。默认情况下,您可以包含姓名和地址等问题,但对于匿名调查,您可能根本无法创建它们。如果调查创建者只想查看每个人对五百个问题的答案,那就是一个非常简单的SQL查询。您可以设置级联删除,以便在删除调查时自动删除回复和问题。使用此架构也可以更轻松地生成统计信息。
Here is a slightly modified version of the schema you provided. I assume you can figure out what data types go where :-)
这是您提供的架构的略微修改版本。我假设您可以找出哪些数据类型去哪里:-)
surveys survey_id (index) title questions question_id (index, auto increment) survey_id (link to surveys->survey_id) question responses response_id (index, auto increment) survey_id (link to surveys->survey_id) submit_time answers answer_id (index, auto increment) question_id (link to questions-question_id) answer
#2
1
I would suggest you always take a normalized approach to your database schema and then later decided if you need to create a solution for performance reasons. Premature optimization can be dangerous. Premature database de-normalization can be disastrous!
我建议您始终对数据库模式采用规范化方法,然后决定是否需要根据性能原因创建解决方案。过早优化可能很危险。过早的数据库反规范化可能是灾难性的!
I would suggest that you stick with the original schema and later, if necessary, create a reporting table that is a de-normalized version of your normalized schema.
我建议您坚持使用原始模式,然后在必要时创建一个报告表,该表是规范化模式的非规范化版本。
#3
1
One change that may or may not help simplify things would be to not link the ResponseAnswers back to the SurveyID. Rather, create an ID per response and per question and let your ResponseAnswers table contain the fields ResponseID, QuestionID, Answer. Although this would require keeping unique Identifiers for each unit it would help keep things a little bit more normalized. The response answers do no need to associate with the survey they were answering just the specific question they are answering and the response information that they are associated.
可能会或可能不会简化事情的一个变化是不将ResponseAnswers链接回SurveyID。而是,为每个响应和每个问题创建一个ID,并让您的ResponseAnswers表包含ResponseID,QuestionID,Answer字段。虽然这需要为每个单元保留唯一的标识符,但这有助于使事情更加标准化。响应答案不需要与他们回答的调查相关联,只是回答他们正在回答的具体问题以及他们关联的响应信息。
#4
0
I created a customer surveys system at my previous job and came up with a schema very similar to what you have. It was used to send out surveys (on paper) and tabulate the responses.
我在之前的工作中创建了一个客户调查系统,并提出了一个与您的模式非常相似的模式。它被用来发送调查(在纸面上)并将回复列表。
A couple of minor differences:
一些细微差别:
-
Surveys were NOT anonymous, and this was made very clear in the printed forms. It also meant that the demographic data in your example was known in advance.
调查不是匿名的,这在印刷表格中非常清楚。这也意味着您的示例中的人口统计数据是事先知道的。
-
There was a pool of questions which were attached to the surveys, so one question could be used on multiple surveys and analyzed independently of the survey it appeared on.
调查附有一系列问题,因此可以在多个调查中使用一个问题,并独立于其出现的调查进行分析。
-
Handling different types of questions got interesting -- we had a 1-3 scale (e.g., Worse/Same/Better), 1-5 scale (Very Bad, Bad, OK, Good, Very Good), Yes/No, and Comments.
处理不同类型的问题很有趣 - 我们有1-3级(例如,更糟/相同/更好),1-5级(非常糟糕,差,好,好,非常好),是/否和评论。
There was special code to handle the comments, but the other question types were handled generically by having a table of question types and another table of valid answers for each type.
有一些特殊的代码来处理注释,但其他问题类型通常由一个问题类型表和另一个每种类型的有效答案表来处理。
To make querying easier you could probably create a function to return the response based on a survey ID and question ID.
为了使查询更容易,您可以创建一个函数来根据调查ID和问题ID返回响应。