如何处理可选列的数据库规范化设计?

时间:2021-01-30 11:10:45

I am working on a system that stores sensor data. Most sensors measure a single value but some can measure many values for each sample period. I am trying to keep my database as normalized as possible without suffering performance problems for looking up lots of sample data. My question is how to design the sensor data table to account for optional measured data values. For example, sensor A only reads one value, but sensor B reads 5 values. How do I store both sets of data in the data table?

我正在开发一个存储传感器数据的系统。大多数传感器测量单个值,但有些传感器可以测量每个采样周期的许多值。我试图使我的数据库尽可能规范化,而不需要查找大量的示例数据。我的问题是如何设计传感器数据表来考虑可选的测量数据值。例如,传感器A只读取一个值,而传感器B读取5个值。如何在数据表中存储这两组数据?

Option 1 is to create a flat structure with a table that has a bunch of columns (value1, value2, value3...valueN, etc) and a field that records how many columns are used. Functional but bad design in my opinion:

选项1是创建一个具有一组列的表的平面结构(value1、value2、value3……)valueN, etc)和一个记录使用了多少列的字段。在我看来,功能却糟糕的设计:

Sensor Data
  Sensor ID (Pk)
  Timestamp (PK)
  Columns Used
  Value 1
  Value 2
  Value 3
  ...
  Value n

The other option is to highly normalize the structure and have a data table that uses a composite key to store individual data values. It would track the sensor id, timestamp, and data type to maintain unique values. This is highly normalized and allows for an unlimited number of optional data values per sample, but duplicates a lot of information (specifically, sensor id and timestamp):

另一种选择是高度规范化结构,并拥有一个使用组合键来存储单个数据值的数据表。它将跟踪传感器id、时间戳和数据类型,以保持惟一值。这是高度规范化的,并且允许每个样本有无限数量的可选数据值,但是重复了很多信息(特别是,传感器id和时间戳):

Sensor Data
  Sensor ID (Pk)
  Timestamp (Pk)
  Data Type (Pk)
  Value

This wouldn't be that bad for a few thousand samples, but this system is designed to store millions of sensor samples and joining those values could suffer performance problems (i.e. WHERE Sensor ID and Timestamp are equal but the Data Type is different).

这对于几千个样本来说并不是那么糟糕,但是这个系统被设计成存储数百万个传感器样本,加入这些值可能会遇到性能问题(例如,传感器ID和时间戳是相等的,但是数据类型不同)。

Anyone have a better idea for designing a database to store optional values? Side note: the design has to work with SQL Server and Entity Framework (EF).

有人有更好的想法设计一个数据库来存储可选值吗?附加说明:设计必须与SQL Server和Entity Framework (EF)配合使用。

1 个解决方案

#1


2  

I think going with option 2 is not bad, even if database will have milions of rows. You will only need a index on SensiorId and Timestamp.

我认为使用选项2是不错的,即使数据库将有百万行。您只需要一个关于感测器和时间戳的索引。

I can think of one different design containing two tables:

我可以想到一个不同的设计包含两个表:

**SensorRead**
Id (PK)
SensorId
Timestamp

**SensorData**
Id(PK)
ReadId(FK)
Value
DataType

If you will query that schema for values for given SensorId and timestamp, then it will result in the join between 10 rows (assuming the sensor read's 10 data points). So the cost is almost none.

如果您将查询该模式中给定的轰动和时间戳的值,那么它将导致10行之间的连接(假设传感器读取10个数据点)。所以成本几乎为零。

Aside from the question itself- Im not sure, that having multiple columns as PK's will work good with entity framework... Never tried it, but if you decide to go that way do some research about this.

除了问题本身——我不确定,有多个列作为PK的将会很好地使用实体框架……从来没有尝试过,但是如果你决定这么做,那就做些研究。

#1


2  

I think going with option 2 is not bad, even if database will have milions of rows. You will only need a index on SensiorId and Timestamp.

我认为使用选项2是不错的,即使数据库将有百万行。您只需要一个关于感测器和时间戳的索引。

I can think of one different design containing two tables:

我可以想到一个不同的设计包含两个表:

**SensorRead**
Id (PK)
SensorId
Timestamp

**SensorData**
Id(PK)
ReadId(FK)
Value
DataType

If you will query that schema for values for given SensorId and timestamp, then it will result in the join between 10 rows (assuming the sensor read's 10 data points). So the cost is almost none.

如果您将查询该模式中给定的轰动和时间戳的值,那么它将导致10行之间的连接(假设传感器读取10个数据点)。所以成本几乎为零。

Aside from the question itself- Im not sure, that having multiple columns as PK's will work good with entity framework... Never tried it, but if you decide to go that way do some research about this.

除了问题本身——我不确定,有多个列作为PK的将会很好地使用实体框架……从来没有尝试过,但是如果你决定这么做,那就做些研究。