Database design question for y'all. I have a form (like, the paper kind) that has several entry points for data. This form has changed, and is expected to change over years. It is being turned into a computer app, so that we can, among other things, quit wasting paper. (And minor things, like have all the data in one central store that can be queried, etc.) I'd like to store all of the forms data in a database, and have it be pretty agnostic as to the changes.
你们的数据库设计问题。我有一个表格(比如纸张类),它有几个数据入口点。这种形式已经改变,预计会随着岁月而改变。它正在变成一个计算机应用程序,以便除其他外,我们可以放弃浪费纸张。 (还有一些小问题,比如可以查询一个*存储中的所有数据,等等)。我想将所有表单数据存储在数据库中,并且对于这些更改是非常不可知的。
Originally, I was just considering each field to be a string -- and I had a table something like this:
最初,我只是考虑每个字段都是一个字符串 - 我有一个像这样的表:
FormId int (FK)
FieldName nvarchar(64)
FieldValue nvarchar(128)
...something like that. It was actually a bit more 3NFy in that FieldName was in another table, associated with an artificial key, so that the field names weren't duplicated all over the place.
......那样的事情。实际上有点3NFy,因为FieldName在另一个表中,与一个人工密钥相关联,因此字段名称不会在整个地方重复。
However, I'd like to extend this to numeric and drop-down data. I could just store numeric data as strings, but that seems like a pretty crappy idea. Same with drop downs.
但是,我想将其扩展为数字和下拉数据。我可以将数字数据存储为字符串,但这似乎是一个非常糟糕的想法。与下降相同。
I could stop using a table, and actually use columns on the main form table (the one that FormId above references), but that means adding a column for each new item as they come along, and older forms would just be null. (And, unless I stored it, I wouldn't know when that column was created. With the string table above, it's implicit.)
我可以停止使用表,并实际使用主表单表上的列(上面引用的FormId的列),但这意味着在每个新项目出现时添加一列,而旧表单只是为空。 (而且,除非我存储它,否则我不知道该列何时被创建。使用上面的字符串表,它是隐含的。)
I could extend the table above to something like:
我可以将上面的表格扩展为:
FormId int (FK)
FieldName nvarchar(64)
FieldValueType int -- enum as to which of the columns below are valid (or just let nulls imply that)
FieldValue nvarchar(128)
FieldValueInt int
Combos would have to be in a OTLT (one true lookup table), which I have reservations about, but perhaps it's needed here?
Combos必须在OTLT(一个真正的查找表)中,我有所保留,但也许这里需要它?
Any advice on *? I'm using MSSQL, but this is really a more general question.
有关*的任何建议吗?我正在使用MSSQL,但这确实是一个更普遍的问题。
4 个解决方案
#1
2
Use Nulls. Proper database design is a complicated subject; you may do well to pick up a good reference and do some research on the whole thing (I gather this is a good book on the topic). In general, it sounds like you would be well served by starting with a single table that encapsulates all the fields in your form, and then putting it through the normalization process. And yes, use nulls and do NOT use an int to enumerate which columns are set to valid values; that is exactly what nulls are for.
使用Nulls。适当的数据库设计是一个复杂的主题;你可能会做好参考并对整个事情做一些研究(我收集这是一本关于这个主题的好书)。一般来说,从单个表开始封装表单中的所有字段,然后将其放入规范化过程中,听起来就好了。是的,使用空值并且不要使用int来枚举哪些列设置为有效值;这正是空值的含义。
#2
2
You could have a separate table for each datatype.
您可以为每种数据类型分别使用一个表。
I.e. to fetch an entire form you'd do an N-way join using the form id where N is the number of distinct datatypes you support (+ perhaps extras depending on the info you want - e.g. dropdown values would probably be stored in another table / your fieldname lookup / etc.)
即要获取整个表单,您需要使用表单ID进行N路连接,其中N是您支持的不同数据类型的数量(+可能是额外的,具体取决于您想要的信息 - 例如,下拉值可能会存储在另一个表中/您的字段名称查找/等)
But the design should probably also depend on how you intend to use the data, which you've said nothing about. And it would also depend on just how fast the rate of change is for these forms . . .
但设计应该也可能取决于你打算如何使用数据,你没有说过。而且还取决于这些形式的变化率有多快。 。 。
#3
1
By creating a table with a description of your forms, you are actually defining a metadata structure. That's daunting. You would need a lot of the infrastructure needed for proper table description. I think the vendors of your database system spent a lot of effort in doing all that.
通过创建包含表单描述的表,您实际上是在定义元数据结构。那令人生畏。您需要很多正确的表格描述所需的基础设施。我认为数据库系统的供应商在完成所有这些工作上花费了很多精力。
At first I thought - what a nice idea! Build your own compatibility-aware table description system!
起初我想 - 这个好主意!构建自己的兼容性感知表描述系统!
But then I thought - I'm too stupid to do that on my own. There must be a database system capable of doing that.
但后来我想 - 我自己这样做太傻了。必须有一个能够做到这一点的数据库系统。
So I conclude, not being a db expert, define proper defaults for 'new fields' in new form versions. Handle the compatibility issue in your business logic.
因此,我得出结论,不是数据库专家,在新表单版本中为“新字段”定义正确的默认值。处理业务逻辑中的兼容性问题。
#4
1
I would strongly advise against having a "generic table" like you describe.
我强烈建议不要像你描述的那样使用“通用表”。
You are essentially reinventing the relational database, which is not a good idea: Queries and updates will be very painful with your structure, and you will not be able to use the more advanced features like foreign keys and triggers, should you need them.
您实际上是在重新构建关系数据库,这不是一个好主意:查询和更新对您的结构非常痛苦,如果需要,您将无法使用更高级的功能,如外键和触发器。
Just make a table(s) with columns for the data fields, and if a form does not have a field, let it be null.
只需创建一个包含数据字段列的表,如果表单没有字段,则将其设为null。
Or, probably even better, have a "base table" (field that are in every form), and give names/version numbers to updated forms, and have a new table for the new columns that this version adds, then use a synthetic PK to join these new tables to your base table.
或者,甚至可能更好,有一个“基表”(每种形式的字段),并为更新的表单提供名称/版本号,并为此版本添加的新列提供一个新表,然后使用合成PK将这些新表连接到基表。
I.e.:
base table: id(numeric,PK), name, birthday, town
addresstable1: street, number, postal code, country, base_table_id (foreign key)
addresstable2: po box no, po box code, base_table_id (FK)
and so on.
等等。
That way you avoid loads of null fields; your tables are not so wide (always desirable), and your records are implicitly versioned, because the list of tables that have a record belonging to a record in your base table tells you which fields the original form had, hence what kind of form was used originally.
这样你就可以避免加载空字段;您的表格不是那么宽(总是可取的),并且您的记录是隐式版本化的,因为具有属于基表中记录的记录的表列表会告诉您原始表单具有哪些字段,因此表单的格式是什么原来用的。
#1
2
Use Nulls. Proper database design is a complicated subject; you may do well to pick up a good reference and do some research on the whole thing (I gather this is a good book on the topic). In general, it sounds like you would be well served by starting with a single table that encapsulates all the fields in your form, and then putting it through the normalization process. And yes, use nulls and do NOT use an int to enumerate which columns are set to valid values; that is exactly what nulls are for.
使用Nulls。适当的数据库设计是一个复杂的主题;你可能会做好参考并对整个事情做一些研究(我收集这是一本关于这个主题的好书)。一般来说,从单个表开始封装表单中的所有字段,然后将其放入规范化过程中,听起来就好了。是的,使用空值并且不要使用int来枚举哪些列设置为有效值;这正是空值的含义。
#2
2
You could have a separate table for each datatype.
您可以为每种数据类型分别使用一个表。
I.e. to fetch an entire form you'd do an N-way join using the form id where N is the number of distinct datatypes you support (+ perhaps extras depending on the info you want - e.g. dropdown values would probably be stored in another table / your fieldname lookup / etc.)
即要获取整个表单,您需要使用表单ID进行N路连接,其中N是您支持的不同数据类型的数量(+可能是额外的,具体取决于您想要的信息 - 例如,下拉值可能会存储在另一个表中/您的字段名称查找/等)
But the design should probably also depend on how you intend to use the data, which you've said nothing about. And it would also depend on just how fast the rate of change is for these forms . . .
但设计应该也可能取决于你打算如何使用数据,你没有说过。而且还取决于这些形式的变化率有多快。 。 。
#3
1
By creating a table with a description of your forms, you are actually defining a metadata structure. That's daunting. You would need a lot of the infrastructure needed for proper table description. I think the vendors of your database system spent a lot of effort in doing all that.
通过创建包含表单描述的表,您实际上是在定义元数据结构。那令人生畏。您需要很多正确的表格描述所需的基础设施。我认为数据库系统的供应商在完成所有这些工作上花费了很多精力。
At first I thought - what a nice idea! Build your own compatibility-aware table description system!
起初我想 - 这个好主意!构建自己的兼容性感知表描述系统!
But then I thought - I'm too stupid to do that on my own. There must be a database system capable of doing that.
但后来我想 - 我自己这样做太傻了。必须有一个能够做到这一点的数据库系统。
So I conclude, not being a db expert, define proper defaults for 'new fields' in new form versions. Handle the compatibility issue in your business logic.
因此,我得出结论,不是数据库专家,在新表单版本中为“新字段”定义正确的默认值。处理业务逻辑中的兼容性问题。
#4
1
I would strongly advise against having a "generic table" like you describe.
我强烈建议不要像你描述的那样使用“通用表”。
You are essentially reinventing the relational database, which is not a good idea: Queries and updates will be very painful with your structure, and you will not be able to use the more advanced features like foreign keys and triggers, should you need them.
您实际上是在重新构建关系数据库,这不是一个好主意:查询和更新对您的结构非常痛苦,如果需要,您将无法使用更高级的功能,如外键和触发器。
Just make a table(s) with columns for the data fields, and if a form does not have a field, let it be null.
只需创建一个包含数据字段列的表,如果表单没有字段,则将其设为null。
Or, probably even better, have a "base table" (field that are in every form), and give names/version numbers to updated forms, and have a new table for the new columns that this version adds, then use a synthetic PK to join these new tables to your base table.
或者,甚至可能更好,有一个“基表”(每种形式的字段),并为更新的表单提供名称/版本号,并为此版本添加的新列提供一个新表,然后使用合成PK将这些新表连接到基表。
I.e.:
base table: id(numeric,PK), name, birthday, town
addresstable1: street, number, postal code, country, base_table_id (foreign key)
addresstable2: po box no, po box code, base_table_id (FK)
and so on.
等等。
That way you avoid loads of null fields; your tables are not so wide (always desirable), and your records are implicitly versioned, because the list of tables that have a record belonging to a record in your base table tells you which fields the original form had, hence what kind of form was used originally.
这样你就可以避免加载空字段;您的表格不是那么宽(总是可取的),并且您的记录是隐式版本化的,因为具有属于基表中记录的记录的表列表会告诉您原始表单具有哪些字段,因此表单的格式是什么原来用的。