I have assigned a task to create (relatively) simple reporting system. In these system, user will be shown a table result of report. A table has some fields and each field give some part of information to user in each record. My problem however is that each report field will not be declared by developer. It must be declared by user of system. So my reports table are dynamic.


I saw example in 'Data Driven Custom View Engine in ASP.NET MVC' for creating dynamic forms using Asp.net MVC Framework but I don't know that is appropriate for my system or not.

我在“数据驱动自定义视图引擎”中看到过一个例子。NET MVC使用Asp.net MVC框架创建动态表单,但我不知道这是否适合我的系统。



Currently I ended with following Entity Relationship Diagram:



In above diagram, I store every record for report in Report table. Also I store type of report in ReportType. For each field that will be used in report record I will use a ReportFieldValue. Type of fields will be stored in ReportField.


So If I want to add a record to my db first I add a row to Report Table. Then for each added record fields I will add a row to ReportFieldValue table.


However as you may notice, in these approach I must store every field value in char(255). The problem is for fields type like datetime that should not be stored as string. Is there any design pattern or architecture for this type of systems?


Avoid stringly-typed data by replacing VALUE with NUMBER_VALUE, DATE_VALUE, STRING_VALUE. Those three types are good enough most of the time. You can add XMLTYPE and other fancy columns later if they're needed. And for Oracle, use VARCHAR2 instead of CHAR to conserve space.


Always try to store values as the correct type. Native data types are faster, smaller, easier to use, and safer.


Oracle has a generic data type system (ANYTYPE, ANYDATA, and ANYDATASET), but those types are difficult to use and should be avoided in most cases.


Architects often think using a single field for all data makes things easier. It makes it easier to generate pretty pictures of the data model but it makes everything else more difficult. Consider these issues:


  1. You cannot do anything interesting with data without knowing the type. Even to display data it's useful to know the type to justify the text. In 99.9% of all use cases it will be obvious to the user which of the 3 columns is relevant.
  2. 在不知道数据类型的情况下,您无法对数据进行任何有趣的操作。即使要显示数据,也要知道要验证文本的类型。在99.9%的所有用例中,用户很容易看出这三列中的哪一列是相关的。
  3. Developing type-safe queries against stringly-typed data is painful. For example, let's say you want to find "Date of Birth" for people born in this millennium:


    select *
    from ReportFieldValue
    join ReportField
        on ReportFieldValue.ReportFieldid = ReportField.id
    where ReportField.name = 'Date of Birth'
        and to_date(value, 'YYYY-MM-DD') > date '2000-01-01'

    Can you spot the bug? The above query is dangerous, even if you stored the date in the correct format, and very few developers know how to properly fix it. Oracle has optimizations that make it difficult to force a specific order of operations. You'll need a query like this to be safe:


    select *
        select ReportFieldValue.*, ReportField.*
            --ROWNUM ensures type safe by preventing view merging and predicate pushing.
        from ReportFieldValue
        join ReportField
            on ReportFieldValue.ReportFieldid = ReportField.id
        where ReportField.name = 'Date of Birth'
    where to_date(value, 'YYYY-MM-DD') > date '2000-01-01';

    You don't want to have to tell every developer to write their queries that way.




Your design is a variation of the Entity Attribute Value (EAV) data model, which is often regarded as an anti-pattern in database design.


Maybe a better approach for you would be to create a reporting values table with, say, 300 columns (NUMBER_VALUE_1 through NUMBER_VALUE_100, VARCHAR2_VALUE_1..100, and DATE_VALUE_1..100).

也许对您来说更好的方法是创建一个报表值表,比如,包含300列(NUMBER_VALUE_1到NUMBER_VALUE_100, VARCHAR2_VALUE_1)。100年,DATE_VALUE_1 . . 100)。

Then, design the rest of your data model around tracking which reports use which columns and what they use each column for.


This has two benefits: first, you are not storing dates and numbers in strings (the benefits of which have already been pointed out), and second, you avoid many of the performance and data integrity issues associated with the EAV model.


EDIT -- adding some empirical results of an EAV model

Using an Oracle 11g2 database, I moved 30,000 records from one table into an EAV data model. I then queried the model to get those 30,000 records back.

使用Oracle 11g2数据库,我将30000条记录从一个表移动到一个EAV数据模型中。然后我询问了这个模型,想要得到那3万张唱片。

SELECT SUM (header_id * LENGTH (ordered_item) * (SYSDATE - schedule_ship_date))
FROM   (SELECT rf.report_type_id,
               MAX (DECODE (rf.report_field_name, 'HEADER_ID', rv.number_value, NULL)) header_id,
               MAX (DECODE (rf.report_field_name, 'LINE_ID', rv.number_value, NULL)) line_id,
               MAX (DECODE (rf.report_field_name, 'ORDERED_ITEM', rv.char_value, NULL)) ordered_item,
               MAX (DECODE (rf.report_field_name, 'SCHEDULE_SHIP_DATE', rv.date_value, NULL)) schedule_ship_date
        FROM   eav_report_record_values rv INNER JOIN eav_report_fields rf ON rf.report_field_id = rv.report_field_id
        WHERE  rv.report_header_id = 20 
        GROUP BY rf.report_type_id, rv.report_header_id, rv.report_record_id)

The results were:


1 row selected.

Elapsed: 00:00:22.62

Execution Plan

| Id  | Operation                       | Name                        | Rows  | Bytes | Cost (%CPU)|
|   0 | SELECT STATEMENT                |                             |     1 |  2026 |    53  (67)|
|   1 |  SORT AGGREGATE                 |                             |     1 |  2026 |            |
|   2 |   VIEW                          |                             |   130K|   251M|    53  (67)|
|   3 |    HASH GROUP BY                |                             |   130K|   261M|    53  (67)|
|   4 |     NESTED LOOPS                |                             |       |       |            |
|   5 |      NESTED LOOPS               |                             |   130K|   261M|    36  (50)|
|   6 |       TABLE ACCESS FULL         | EAV_REPORT_FIELDS           |   350 | 15050 |    18   (0)|
|*  7 |       INDEX RANGE SCAN          | EAV_REPORT_RECORD_VALUES_N1 |   130K|       |     0   (0)|
|*  8 |      TABLE ACCESS BY INDEX ROWID| EAV_REPORT_RECORD_VALUES    |   372 |   749K|     0   (0)|

Predicate Information (identified by operation id):

   7 - access("RV"."REPORT_HEADER_ID"=20)
   8 - filter("RF"."REPORT_FIELD_ID"="RV"."REPORT_FIELD_ID")

   - 'PLAN_TABLE' is old version

          4  recursive calls
          0  db block gets
     275480  consistent gets
        465  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        252  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That's 22 seconds to get 30,000 rows of 4 columns each. That is way too long. From a flat table we'd be looking at under 2 seconds, easy.




Use MariaDB, with it's Dynamic Columns. Effectively, that lets you put all the miscellany columns into a single column, yet still give you efficient access to them.


I would keep a few of the common fields in their own columns.


More discussion of EAV and suggestions (and how to do it without Dynamic Columns).




Well, you have a very good point about storing data in the correct data types.
And i agree that this does pose a problem for user-defined data systems.


One way of solveing this problem is by adding tables for each data type group (ints, floating points, strings, binary and dates, instead of keeping the value in the ReportFieldValue table. However, this will make your life harder since you will have to select and join multiple tables in order to get a single result.


another way would be to add a data type column in the ReportFieldValue and create a user defined function to dynamically cast the data from strings to the appropriate data type (using the value in the data type column), so that you can use that for sorting, searching etc`.


Sql server also has a data type called sql_variant that should support multiple types, and though I've never worked with it it's documentation seems promising.

Sql server还有一个名为sql_variant的数据类型,它应该支持多种类型,尽管我从未使用过它,但它的文档似乎很有前途。



