具有许多行或多列的大型db表

时间:2022-01-12 04:30:18

I have tried to have a normalized table design. The problem (maybe) is that we are generating a lot of data, and therefore a lot of rows. Currently the database is increasing in size by 0,25 GB per day.

我试过一个规范化的表格设计。问题(可能)是我们生成了大量数据,因此产生了很多行。目前,该数据库的规模每天增加0.25 GB。

The main tables are Samples an Boxes. There's a one-to-many relation from Samples to Boxes. Sample table:

主要表格是样品盒子。样本与盒子之间存在一对多的关系。样品表:

ID | Timestamp | CamId 

Boxes table:

ID | SampleID | Volume | ...

We analyse 19 samples each 5 seconds, and each sample on avg has 7 boxes. That's 19*7*12 = 1596 boxes each minute and 1596*60*24 = 2,298,240 rows in Boxes table each day on avg.

我们每5秒分析19个样本,平均每个样本有7个盒子。这是每分钟19 * 7 * 12 = 1596个盒子,每天平均每盒1596 * 60 * 24 = 2,298,240行。

This setup might run for months. At this time the Boxes table has about 25 million rows.

此设置可能会持续数月。此时Boxes表有大约2500万行。

Quistion is; should i be worried about database size, table size and table design with so much data?

Quistion是;我应该担心数据库大小,表格大小和表格设计有这么多数据吗?

Or should I have a table like

或者我应该有一张像这样的桌子

ID | SampleID | CamId | Volume1 | Volume2 | ... | Volume9 | ...

3 个解决方案

#1


1  

Depending on the validity of your data, you could implement a purge of your data. What I mean is: do you really need data from days ago, months ago, years ago? If you have a time limit of use for your data, purge them and your data table should stop growing (or likely) after a set amount of time.

根据数据的有效性,您可以实施数据清除。我的意思是:你真的需要几天前,几个月前,几年前的数据吗?如果您有数据使用时间限制,请清除它们,并且数据表应在一段时间后停止增长(或可能)。

This way you wouldn't need to care that much about either architecture for the sake of size.

这样你就不需要为了大小而关心这两种架构。

Otherwise the answer is yes, you should care. Separate notions in a lot of tables could give you a good tweak on performance but maybe won't be sufficient in terms of access time after a long time. Consider looking at NoSQL solutions or alike in order to store heavy rows.

否则答案是肯定的,你应该关心。许多表格中的单独概念可以为您提供良好的性能调整,但在很长一段时间后访问时间可能不够。考虑使用NoSQL解决方案或同样来存储繁重的行。

#2


1  

There is one simple rule: Whenever you think you have to put a number to a column's name you probably need a related table.

有一条简单的规则:每当您认为必须将数字放入列的名称时,您可能需要一个相关的表。

The amount of data will be roughly the same, no wins here.

数据量大致相同,这里没有胜利。

I'd try to partition the table. AFAIK this feature was bound to the Enterprise Editions, but - according to this document - with SQL Server 2016 SP1 table and index partitioning is coming down even to Express!

我试着对表格进行分区。 AFAIK这个功能被绑定到企业版,但是 - 根据这个文档 - 使用SQL Server 2016 SP1表和索引分区甚至连快递!

The main question is: What are you going to do with this data?

主要问题是:你打算用这些数据做什么?

If you have to run analytical scripts over everything, there won't be a much better hint than buy better hardware.
If your needs refer to data of the last 3 weeks you will be fine off with partitioning.

如果你必须对所有东西运行分析脚本,那么就没有比购买更好的硬件更好的提示了。如果您的需求参考过去3周的数据,您可以使用分区。

If you cannot use this feature yet (due to your Server's version), you can create an archive table and move older data into this table in regular jobs. A UNION ALL view would still allow to grab the whole lot. With SCHEMA BINDING you might even get the advantages of indexed views.

如果您尚未使用此功能(由于您的服务器版本),您可以创建存档表并在常规作业中将旧数据移动到此表中。 UNION ALL视图仍然允许抓取整个批次。使用SCHEMA BINDING,您甚至可以获得索引视图的优势。

In this case it is clever, to hold your working data in your fastest drive and put the archive table in a separate file on a large storage somewhere else.

在这种情况下,将工作数据保存在最快的驱动器中并将存档表放在其他地方的大型存储上的单独文件中是很聪明的。

#3


0  

Question is, should i be worried about database size, table size and table design with so much data?

问题是,我是否应该担心数据库大小,表格大小和表格设计有这么多数据?

My answer is YES:

我的回答是肯定的:

1. A huge amount of data(daily) should affect your storage in hardware part.
2. Table normalized is a must mostly if you are storing bytes or images.

#1


1  

Depending on the validity of your data, you could implement a purge of your data. What I mean is: do you really need data from days ago, months ago, years ago? If you have a time limit of use for your data, purge them and your data table should stop growing (or likely) after a set amount of time.

根据数据的有效性,您可以实施数据清除。我的意思是:你真的需要几天前,几个月前,几年前的数据吗?如果您有数据使用时间限制,请清除它们,并且数据表应在一段时间后停止增长(或可能)。

This way you wouldn't need to care that much about either architecture for the sake of size.

这样你就不需要为了大小而关心这两种架构。

Otherwise the answer is yes, you should care. Separate notions in a lot of tables could give you a good tweak on performance but maybe won't be sufficient in terms of access time after a long time. Consider looking at NoSQL solutions or alike in order to store heavy rows.

否则答案是肯定的,你应该关心。许多表格中的单独概念可以为您提供良好的性能调整,但在很长一段时间后访问时间可能不够。考虑使用NoSQL解决方案或同样来存储繁重的行。

#2


1  

There is one simple rule: Whenever you think you have to put a number to a column's name you probably need a related table.

有一条简单的规则:每当您认为必须将数字放入列的名称时,您可能需要一个相关的表。

The amount of data will be roughly the same, no wins here.

数据量大致相同,这里没有胜利。

I'd try to partition the table. AFAIK this feature was bound to the Enterprise Editions, but - according to this document - with SQL Server 2016 SP1 table and index partitioning is coming down even to Express!

我试着对表格进行分区。 AFAIK这个功能被绑定到企业版,但是 - 根据这个文档 - 使用SQL Server 2016 SP1表和索引分区甚至连快递!

The main question is: What are you going to do with this data?

主要问题是:你打算用这些数据做什么?

If you have to run analytical scripts over everything, there won't be a much better hint than buy better hardware.
If your needs refer to data of the last 3 weeks you will be fine off with partitioning.

如果你必须对所有东西运行分析脚本,那么就没有比购买更好的硬件更好的提示了。如果您的需求参考过去3周的数据,您可以使用分区。

If you cannot use this feature yet (due to your Server's version), you can create an archive table and move older data into this table in regular jobs. A UNION ALL view would still allow to grab the whole lot. With SCHEMA BINDING you might even get the advantages of indexed views.

如果您尚未使用此功能(由于您的服务器版本),您可以创建存档表并在常规作业中将旧数据移动到此表中。 UNION ALL视图仍然允许抓取整个批次。使用SCHEMA BINDING,您甚至可以获得索引视图的优势。

In this case it is clever, to hold your working data in your fastest drive and put the archive table in a separate file on a large storage somewhere else.

在这种情况下,将工作数据保存在最快的驱动器中并将存档表放在其他地方的大型存储上的单独文件中是很聪明的。

#3


0  

Question is, should i be worried about database size, table size and table design with so much data?

问题是,我是否应该担心数据库大小,表格大小和表格设计有这么多数据?

My answer is YES:

我的回答是肯定的:

1. A huge amount of data(daily) should affect your storage in hardware part.
2. Table normalized is a must mostly if you are storing bytes or images.