关系数据库中自定义字段的设计模式

时间:2022-11-26 12:45:56

I have assigned a task to create (relatively) simple reporting system. In these system, user will be shown a table result of report. A table has some fields and each field give some part of information to user in each record. My problem however is that each report field will not be declared by developer. It must be declared by user of system. So my reports table are dynamic.

我分配了一个任务来创建(相对)简单的报告系统。在这些系统中,用户将显示报表的表格结果。表有一些字段,每个字段在每个记录中为用户提供部分信息。但我的问题是,开发人员不会声明每个报表字段。它必须由系统用户声明。我的报表是动态的。

I saw example in 'Data Driven Custom View Engine in ASP.NET MVC' for creating dynamic forms using Asp.net MVC Framework but I don't know that is appropriate for my system or not.

我在“数据驱动自定义视图引擎”中看到过一个例子。NET MVC使用Asp.net MVC框架创建动态表单,但我不知道这是否适合我的系统。

Update1:

Update1:

Currently I ended with following Entity Relationship Diagram:

目前我以以下实体关系图结束:

关系数据库中自定义字段的设计模式

In above diagram, I store every record for report in Report table. Also I store type of report in ReportType. For each field that will be used in report record I will use a ReportFieldValue. Type of fields will be stored in ReportField.

在上面的图表中,我将报表的每条记录都存储在报表中。我还在ReportType中存储报表类型。对于将在报表记录中使用的每个字段,我将使用ReportFieldValue。字段类型将存储在ReportField中。

So If I want to add a record to my db first I add a row to Report Table. Then for each added record fields I will add a row to ReportFieldValue table.

因此,如果我想在db中添加一条记录,首先我要向报表中添加一行。然后,对于每个添加的记录字段,我将向ReportFieldValue表添加一行。

However as you may notice, in these approach I must store every field value in char(255). The problem is for fields type like datetime that should not be stored as string. Is there any design pattern or architecture for this type of systems?

但是,您可能注意到,在这些方法中,我必须将每个字段值存储在char(255)中。问题是像datetime这样的字段类型不应该存储为string。这种类型的系统有什么设计模式或体系结构吗?

4 个解决方案

#1


13  

Avoid stringly-typed data by replacing VALUE with NUMBER_VALUE, DATE_VALUE, STRING_VALUE. Those three types are good enough most of the time. You can add XMLTYPE and other fancy columns later if they're needed. And for Oracle, use VARCHAR2 instead of CHAR to conserve space.

通过用NUMBER_VALUE、DATE_VALUE、STRING_VALUE替换值来避免严格类型的数据。这三种类型在大多数情况下都足够好了。如果需要,可以稍后添加XMLTYPE和其他花哨的列。对于Oracle,使用VARCHAR2而不是CHAR来节省空间。

Always try to store values as the correct type. Native data types are faster, smaller, easier to use, and safer.

始终尝试将值存储为正确的类型。本机数据类型更快、更小、更容易使用和更安全。

Oracle has a generic data type system (ANYTYPE, ANYDATA, and ANYDATASET), but those types are difficult to use and should be avoided in most cases.

Oracle有一个通用的数据类型系统(ANYTYPE、ANYDATA和ANYDATASET),但是这些类型很难使用,在大多数情况下应该避免使用。

Architects often think using a single field for all data makes things easier. It makes it easier to generate pretty pictures of the data model but it makes everything else more difficult. Consider these issues:

架构师通常认为使用单个字段来处理所有数据会使事情变得更容易。它使生成数据模型的漂亮图片变得更容易,但它使其他一切都变得更困难。考虑这些问题:

  1. You cannot do anything interesting with data without knowing the type. Even to display data it's useful to know the type to justify the text. In 99.9% of all use cases it will be obvious to the user which of the 3 columns is relevant.
  2. 在不知道数据类型的情况下,您无法对数据进行任何有趣的操作。即使要显示数据,也要知道要验证文本的类型。在99.9%的所有用例中,用户很容易看出这三列中的哪一列是相关的。
  3. Developing type-safe queries against stringly-typed data is painful. For example, let's say you want to find "Date of Birth" for people born in this millennium:

    针对严格类型的数据开发类型安全查询是很痛苦的。例如,假设你想要为这个千禧年出生的人找到“出生日期”:

    select *
    from ReportFieldValue
    join ReportField
        on ReportFieldValue.ReportFieldid = ReportField.id
    where ReportField.name = 'Date of Birth'
        and to_date(value, 'YYYY-MM-DD') > date '2000-01-01'
    

    Can you spot the bug? The above query is dangerous, even if you stored the date in the correct format, and very few developers know how to properly fix it. Oracle has optimizations that make it difficult to force a specific order of operations. You'll need a query like this to be safe:

    你能发现那个虫子吗?上面的查询是危险的,即使您以正确的格式存储日期,并且很少有开发人员知道如何正确地修复它。Oracle的优化使得强制一个特定的操作顺序变得困难。您需要这样的查询才能安全:

    select *
    from
    (
        select ReportFieldValue.*, ReportField.*
            --ROWNUM ensures type safe by preventing view merging and predicate pushing.
            ,rownum
        from ReportFieldValue
        join ReportField
            on ReportFieldValue.ReportFieldid = ReportField.id
        where ReportField.name = 'Date of Birth'
    )
    where to_date(value, 'YYYY-MM-DD') > date '2000-01-01';
    

    You don't want to have to tell every developer to write their queries that way.

    您不需要告诉每个开发人员以这种方式编写他们的查询。

#2


8  

Your design is a variation of the Entity Attribute Value (EAV) data model, which is often regarded as an anti-pattern in database design.

您的设计是实体属性值(EAV)数据模型的变体,在数据库设计中经常被视为反模式。

Maybe a better approach for you would be to create a reporting values table with, say, 300 columns (NUMBER_VALUE_1 through NUMBER_VALUE_100, VARCHAR2_VALUE_1..100, and DATE_VALUE_1..100).

也许对您来说更好的方法是创建一个报表值表,比如,包含300列(NUMBER_VALUE_1到NUMBER_VALUE_100, VARCHAR2_VALUE_1)。100年,DATE_VALUE_1 . . 100)。

Then, design the rest of your data model around tracking which reports use which columns and what they use each column for.

然后,设计其余的数据模型,跟踪哪些报告使用哪些列以及它们使用哪些列。

This has two benefits: first, you are not storing dates and numbers in strings (the benefits of which have already been pointed out), and second, you avoid many of the performance and data integrity issues associated with the EAV model.

这有两个好处:首先,您没有将日期和数字存储在字符串中(已经指出了其好处),其次,您避免了与EAV模型相关的许多性能和数据完整性问题。

EDIT -- adding some empirical results of an EAV model

Using an Oracle 11g2 database, I moved 30,000 records from one table into an EAV data model. I then queried the model to get those 30,000 records back.

使用Oracle 11g2数据库,我将30000条记录从一个表移动到一个EAV数据模型中。然后我询问了这个模型,想要得到那3万张唱片。

SELECT SUM (header_id * LENGTH (ordered_item) * (SYSDATE - schedule_ship_date))
FROM   (SELECT rf.report_type_id,
               rv.report_header_id,
               rv.report_record_id,
               MAX (DECODE (rf.report_field_name, 'HEADER_ID', rv.number_value, NULL)) header_id,
               MAX (DECODE (rf.report_field_name, 'LINE_ID', rv.number_value, NULL)) line_id,
               MAX (DECODE (rf.report_field_name, 'ORDERED_ITEM', rv.char_value, NULL)) ordered_item,
               MAX (DECODE (rf.report_field_name, 'SCHEDULE_SHIP_DATE', rv.date_value, NULL)) schedule_ship_date
        FROM   eav_report_record_values rv INNER JOIN eav_report_fields rf ON rf.report_field_id = rv.report_field_id
        WHERE  rv.report_header_id = 20 
        GROUP BY rf.report_type_id, rv.report_header_id, rv.report_record_id)

The results were:

结果:

1 row selected.

Elapsed: 00:00:22.62

Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                        | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                             |     1 |  2026 |    53  (67)|
|   1 |  SORT AGGREGATE                 |                             |     1 |  2026 |            |
|   2 |   VIEW                          |                             |   130K|   251M|    53  (67)|
|   3 |    HASH GROUP BY                |                             |   130K|   261M|    53  (67)|
|   4 |     NESTED LOOPS                |                             |       |       |            |
|   5 |      NESTED LOOPS               |                             |   130K|   261M|    36  (50)|
|   6 |       TABLE ACCESS FULL         | EAV_REPORT_FIELDS           |   350 | 15050 |    18   (0)|
|*  7 |       INDEX RANGE SCAN          | EAV_REPORT_RECORD_VALUES_N1 |   130K|       |     0   (0)|
|*  8 |      TABLE ACCESS BY INDEX ROWID| EAV_REPORT_RECORD_VALUES    |   372 |   749K|     0   (0)|
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("RV"."REPORT_HEADER_ID"=20)
   8 - filter("RF"."REPORT_FIELD_ID"="RV"."REPORT_FIELD_ID")

Note
-----
   - 'PLAN_TABLE' is old version


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
     275480  consistent gets
        465  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        252  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That's 22 seconds to get 30,000 rows of 4 columns each. That is way too long. From a flat table we'd be looking at under 2 seconds, easy.

这是22秒得到3万行4列。那太长了。从一张桌子上我们可以看到2秒内,很简单。

#3


3  

Use MariaDB, with it's Dynamic Columns. Effectively, that lets you put all the miscellany columns into a single column, yet still give you efficient access to them.

使用MariaDB,使用它的动态列。实际上,这使您可以将所有的杂项列放在一个列中,但仍然可以有效地访问它们。

I would keep a few of the common fields in their own columns.

我将在自己的列中保留一些公共字段。

More discussion of EAV and suggestions (and how to do it without Dynamic Columns).

更多关于EAV和建议的讨论(以及如何在没有动态列的情况下进行)。

#4


1  

Well, you have a very good point about storing data in the correct data types.
And i agree that this does pose a problem for user-defined data systems.

好吧,关于以正确的数据类型存储数据,您有一个很好的观点。我同意这确实给用户定义的数据系统带来了问题。

One way of solveing this problem is by adding tables for each data type group (ints, floating points, strings, binary and dates, instead of keeping the value in the ReportFieldValue table. However, this will make your life harder since you will have to select and join multiple tables in order to get a single result.

解决这个问题的一种方法是为每个数据类型组(int、浮点数、字符串、二进制和日期)添加表,而不是将值保存在ReportFieldValue表中。然而,这将使您的生活更加困难,因为您将不得不选择并加入多个表以获得单个结果。

another way would be to add a data type column in the ReportFieldValue and create a user defined function to dynamically cast the data from strings to the appropriate data type (using the value in the data type column), so that you can use that for sorting, searching etc`.

另一种方法是在ReportFieldValue中添加一个数据类型列,并创建一个用户定义的函数,将数据从字符串动态地转换为适当的数据类型(使用数据类型列中的值),以便您可以使用它进行排序、搜索等。

Sql server also has a data type called sql_variant that should support multiple types, and though I've never worked with it it's documentation seems promising.

Sql server还有一个名为sql_variant的数据类型,它应该支持多种类型,尽管我从未使用过它,但它的文档似乎很有前途。

#1


13  

Avoid stringly-typed data by replacing VALUE with NUMBER_VALUE, DATE_VALUE, STRING_VALUE. Those three types are good enough most of the time. You can add XMLTYPE and other fancy columns later if they're needed. And for Oracle, use VARCHAR2 instead of CHAR to conserve space.

通过用NUMBER_VALUE、DATE_VALUE、STRING_VALUE替换值来避免严格类型的数据。这三种类型在大多数情况下都足够好了。如果需要,可以稍后添加XMLTYPE和其他花哨的列。对于Oracle,使用VARCHAR2而不是CHAR来节省空间。

Always try to store values as the correct type. Native data types are faster, smaller, easier to use, and safer.

始终尝试将值存储为正确的类型。本机数据类型更快、更小、更容易使用和更安全。

Oracle has a generic data type system (ANYTYPE, ANYDATA, and ANYDATASET), but those types are difficult to use and should be avoided in most cases.

Oracle有一个通用的数据类型系统(ANYTYPE、ANYDATA和ANYDATASET),但是这些类型很难使用,在大多数情况下应该避免使用。

Architects often think using a single field for all data makes things easier. It makes it easier to generate pretty pictures of the data model but it makes everything else more difficult. Consider these issues:

架构师通常认为使用单个字段来处理所有数据会使事情变得更容易。它使生成数据模型的漂亮图片变得更容易,但它使其他一切都变得更困难。考虑这些问题:

  1. You cannot do anything interesting with data without knowing the type. Even to display data it's useful to know the type to justify the text. In 99.9% of all use cases it will be obvious to the user which of the 3 columns is relevant.
  2. 在不知道数据类型的情况下,您无法对数据进行任何有趣的操作。即使要显示数据,也要知道要验证文本的类型。在99.9%的所有用例中,用户很容易看出这三列中的哪一列是相关的。
  3. Developing type-safe queries against stringly-typed data is painful. For example, let's say you want to find "Date of Birth" for people born in this millennium:

    针对严格类型的数据开发类型安全查询是很痛苦的。例如,假设你想要为这个千禧年出生的人找到“出生日期”:

    select *
    from ReportFieldValue
    join ReportField
        on ReportFieldValue.ReportFieldid = ReportField.id
    where ReportField.name = 'Date of Birth'
        and to_date(value, 'YYYY-MM-DD') > date '2000-01-01'
    

    Can you spot the bug? The above query is dangerous, even if you stored the date in the correct format, and very few developers know how to properly fix it. Oracle has optimizations that make it difficult to force a specific order of operations. You'll need a query like this to be safe:

    你能发现那个虫子吗?上面的查询是危险的,即使您以正确的格式存储日期,并且很少有开发人员知道如何正确地修复它。Oracle的优化使得强制一个特定的操作顺序变得困难。您需要这样的查询才能安全:

    select *
    from
    (
        select ReportFieldValue.*, ReportField.*
            --ROWNUM ensures type safe by preventing view merging and predicate pushing.
            ,rownum
        from ReportFieldValue
        join ReportField
            on ReportFieldValue.ReportFieldid = ReportField.id
        where ReportField.name = 'Date of Birth'
    )
    where to_date(value, 'YYYY-MM-DD') > date '2000-01-01';
    

    You don't want to have to tell every developer to write their queries that way.

    您不需要告诉每个开发人员以这种方式编写他们的查询。

#2


8  

Your design is a variation of the Entity Attribute Value (EAV) data model, which is often regarded as an anti-pattern in database design.

您的设计是实体属性值(EAV)数据模型的变体,在数据库设计中经常被视为反模式。

Maybe a better approach for you would be to create a reporting values table with, say, 300 columns (NUMBER_VALUE_1 through NUMBER_VALUE_100, VARCHAR2_VALUE_1..100, and DATE_VALUE_1..100).

也许对您来说更好的方法是创建一个报表值表,比如,包含300列(NUMBER_VALUE_1到NUMBER_VALUE_100, VARCHAR2_VALUE_1)。100年,DATE_VALUE_1 . . 100)。

Then, design the rest of your data model around tracking which reports use which columns and what they use each column for.

然后,设计其余的数据模型,跟踪哪些报告使用哪些列以及它们使用哪些列。

This has two benefits: first, you are not storing dates and numbers in strings (the benefits of which have already been pointed out), and second, you avoid many of the performance and data integrity issues associated with the EAV model.

这有两个好处:首先,您没有将日期和数字存储在字符串中(已经指出了其好处),其次,您避免了与EAV模型相关的许多性能和数据完整性问题。

EDIT -- adding some empirical results of an EAV model

Using an Oracle 11g2 database, I moved 30,000 records from one table into an EAV data model. I then queried the model to get those 30,000 records back.

使用Oracle 11g2数据库,我将30000条记录从一个表移动到一个EAV数据模型中。然后我询问了这个模型,想要得到那3万张唱片。

SELECT SUM (header_id * LENGTH (ordered_item) * (SYSDATE - schedule_ship_date))
FROM   (SELECT rf.report_type_id,
               rv.report_header_id,
               rv.report_record_id,
               MAX (DECODE (rf.report_field_name, 'HEADER_ID', rv.number_value, NULL)) header_id,
               MAX (DECODE (rf.report_field_name, 'LINE_ID', rv.number_value, NULL)) line_id,
               MAX (DECODE (rf.report_field_name, 'ORDERED_ITEM', rv.char_value, NULL)) ordered_item,
               MAX (DECODE (rf.report_field_name, 'SCHEDULE_SHIP_DATE', rv.date_value, NULL)) schedule_ship_date
        FROM   eav_report_record_values rv INNER JOIN eav_report_fields rf ON rf.report_field_id = rv.report_field_id
        WHERE  rv.report_header_id = 20 
        GROUP BY rf.report_type_id, rv.report_header_id, rv.report_record_id)

The results were:

结果:

1 row selected.

Elapsed: 00:00:22.62

Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                        | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                             |     1 |  2026 |    53  (67)|
|   1 |  SORT AGGREGATE                 |                             |     1 |  2026 |            |
|   2 |   VIEW                          |                             |   130K|   251M|    53  (67)|
|   3 |    HASH GROUP BY                |                             |   130K|   261M|    53  (67)|
|   4 |     NESTED LOOPS                |                             |       |       |            |
|   5 |      NESTED LOOPS               |                             |   130K|   261M|    36  (50)|
|   6 |       TABLE ACCESS FULL         | EAV_REPORT_FIELDS           |   350 | 15050 |    18   (0)|
|*  7 |       INDEX RANGE SCAN          | EAV_REPORT_RECORD_VALUES_N1 |   130K|       |     0   (0)|
|*  8 |      TABLE ACCESS BY INDEX ROWID| EAV_REPORT_RECORD_VALUES    |   372 |   749K|     0   (0)|
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("RV"."REPORT_HEADER_ID"=20)
   8 - filter("RF"."REPORT_FIELD_ID"="RV"."REPORT_FIELD_ID")

Note
-----
   - 'PLAN_TABLE' is old version


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
     275480  consistent gets
        465  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        252  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That's 22 seconds to get 30,000 rows of 4 columns each. That is way too long. From a flat table we'd be looking at under 2 seconds, easy.

这是22秒得到3万行4列。那太长了。从一张桌子上我们可以看到2秒内,很简单。

#3


3  

Use MariaDB, with it's Dynamic Columns. Effectively, that lets you put all the miscellany columns into a single column, yet still give you efficient access to them.

使用MariaDB,使用它的动态列。实际上,这使您可以将所有的杂项列放在一个列中,但仍然可以有效地访问它们。

I would keep a few of the common fields in their own columns.

我将在自己的列中保留一些公共字段。

More discussion of EAV and suggestions (and how to do it without Dynamic Columns).

更多关于EAV和建议的讨论(以及如何在没有动态列的情况下进行)。

#4


1  

Well, you have a very good point about storing data in the correct data types.
And i agree that this does pose a problem for user-defined data systems.

好吧,关于以正确的数据类型存储数据,您有一个很好的观点。我同意这确实给用户定义的数据系统带来了问题。

One way of solveing this problem is by adding tables for each data type group (ints, floating points, strings, binary and dates, instead of keeping the value in the ReportFieldValue table. However, this will make your life harder since you will have to select and join multiple tables in order to get a single result.

解决这个问题的一种方法是为每个数据类型组(int、浮点数、字符串、二进制和日期)添加表,而不是将值保存在ReportFieldValue表中。然而,这将使您的生活更加困难,因为您将不得不选择并加入多个表以获得单个结果。

another way would be to add a data type column in the ReportFieldValue and create a user defined function to dynamically cast the data from strings to the appropriate data type (using the value in the data type column), so that you can use that for sorting, searching etc`.

另一种方法是在ReportFieldValue中添加一个数据类型列,并创建一个用户定义的函数,将数据从字符串动态地转换为适当的数据类型(使用数据类型列中的值),以便您可以使用它进行排序、搜索等。

Sql server also has a data type called sql_variant that should support multiple types, and though I've never worked with it it's documentation seems promising.

Sql server还有一个名为sql_variant的数据类型,它应该支持多种类型,尽管我从未使用过它,但它的文档似乎很有前途。