I develop software that stores a lot of data in one of its database tables (SQL Server version 8, 9 or 10). Let's say, about 100,000 records are inserted into that table per day. This is about 36 million records per year. For fear that I would lose on performance, I decided to create a new table everyday (a table with current date in its name) to lower the number of records per table.
我开发了一种软件,可以在其中一个数据库表(SQL Server version 8、9或10)中存储大量数据。假设每天有大约100,000条记录被插入到该表中。这是每年大约3600万的记录。为了避免性能上的损失,我决定每天创建一个新表(一个名称中包含当前日期的表),以减少每个表的记录数量。
Could you please tell me, whether it was a good idea? Is there a record limit for SQL server tables? Or do you know how many records (more or less) can be stored in a table before performance is lowered significantly?
你能告诉我,这是不是一个好主意吗?SQL server表是否有记录限制?或者您知道在性能下降之前,有多少记录(或多或少)可以存储在一个表中?
12 个解决方案
#1
31
It's hard to give a generic answer to this. It really depends on number of factors:
很难给出一个通用的答案。这取决于因素的数量:
- what size your row is
- 你这行有多大
- what kind of data you store (strings, blobs, numbers)
- 您存储的数据类型(字符串、blob、数字)
- what do you do with your data (just keep it as archive, query it regularly)
- 你如何处理你的数据(只是把它作为存档,定期查询)
- do you have indexes on your table - how many
- 你的表上有索引吗
- what's your server specs
- 你的服务器规格
etc.
等。
As answered elsewhere here, 100,000 a day and thus per table is overkill - I'd suggest monthly or weekly perhaps even quarterly. The more tables you have the bigger maintenance/query nightmare it will become.
就像其他地方回答的那样,每天10万英镑,因此每一张桌子都是多余的——我建议每个月或每周甚至每季度。您拥有的表越多,维护/查询噩梦就越大。
#2
78
These are some of the Maximum Capacity Specifications for SQL Server 2008 R2
这些是SQL Server 2008 R2的一些最大容量规范
- Database size: 524,272 terabytes
- 数据库大小:524272字节
- Databases per instance of SQL Server: 32,767
- 每个SQL Server实例的数据库:32,767
- Filegroups per database: 32,767
- Filegroups每个数据库:32767
- Files per database: 32,767
- 每个数据库文件:32767
- File size (data): 16 terabytes
- 文件大小(数据):16tb。
- File size (log): 2 terabytes
- 文件大小(日志):2 tb
- Rows per table: Limited by available storage
- 每表行:受可用存储限制
- Tables per database: Limited by number of objects in a database
- 每个数据库的表:受数据库中对象数量的限制
#3
27
I have a three column table with just over 6 Billion rows in SQL Server 2008 R2.
我有一个三列表,SQL Server 2008 R2中有60多亿行。
We query it every day to create minute-by-minute system analysis charts for our customers. I have not noticed any database performance hits (though the fact that it grows ~1 GB every day does make managing backups a bit more involved than I would like).
我们每天都对它进行查询,为客户创建每分钟的系统分析图表。我没有注意到任何数据库性能问题(尽管它每天增长1 GB确实使管理备份比我希望的要复杂一些)。
Update July 2016
2016年7月更新
We made it to ~24.5 billion rows before backups became large enough for us to decide to truncate records older than two years (~700 GB stored in multiple backups, including on expensive tapes). It's worth noting that performance was not a significant motivator in this decision (i.e., it was still working great).
在备份变得足够大到可以截断两年以上的记录(在多个备份中存储约700 GB,包括在昂贵的磁带上)之前,我们完成了大约245亿行。值得注意的是,绩效在这个决定中并不是一个重要的激励因素。它仍然运行得很好)。
For anyone who finds themselves trying to delete 20 billion rows from SQL Server, I highly recommend this article. Relevant code in case the link dies (read the article for a full explanation):
对于那些试图从SQL Server中删除200亿行的人,我强烈推荐这篇文章。如果链接失效,相关代码(请阅读本文以获得完整的解释):
ALTER DATABASE DeleteRecord SET RECOVERY SIMPLE;
GO
BEGIN TRY
BEGIN TRANSACTION
-- Bulk logged
SELECT *
INTO dbo.bigtable_intermediate
FROM dbo.bigtable
WHERE Id % 2 = 0;
-- minimal logged because DDL-Operation
TRUNCATE TABLE dbo.bigtable;
-- Bulk logged because target table is exclusivly locked!
SET IDENTITY_INSERT dbo.bigTable ON;
INSERT INTO dbo.bigtable WITH (TABLOCK) (Id, c1, c2, c3)
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;
SET IDENTITY_INSERT dbo.bigtable OFF;
COMMIT
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK
END CATCH
ALTER DATABASE DeleteRecord SET RECOVERY FULL;
GO
Update November 2016
2016年11月更新
If you plan on storing this much data in a single table: don't. I highly recommend you consider table partitioning (either manually or with the built-in features if you're running Enterprise edition). This makes dropping old data as easy as truncating a table once a (week/month/etc.). If you don't have Enterprise (which we don't), you can simply write a script which runs once a month, drops tables older than 2 years, creates next month's table, and regenerates a dynamic view that joins all of the partition tables together for easy querying. Obviously "once a month" and "older than 2 years" should be defined by you based on what makes sense for your use-case. Deleting directly from a table with tens of billions of rows of data will a) take a HUGE amount of time and b) fill up the transaction log hundreds or thousands of times over.
如果您打算将这么多数据存储在一个表中,请不要这样做。我强烈建议您考虑表分区(如果您正在运行企业版,可以手动或使用内置特性)。这使得删除旧数据就像每周/每月删除一次表一样容易。如果您没有Enterprise(我们没有),您可以简单地编写一个每月运行一次的脚本,删除超过2年的表,创建下个月的表,并重新生成一个动态视图,该视图将所有分区表连接在一起,以便查询。显然,“每月一次”和“超过2年”应该由您根据您的用例的意义来定义。直接从表中删除数百亿行的数据将会花费大量的时间和b)将事务日志填满成百上千次。
#4
18
I do not know of a row limit, but I know tables with more than 170 million rows. You may speed it up using partitioned tables (2005+) or views that connect multiple tables.
我不知道行限制,但我知道有超过1.7亿行的表。您可以使用分区表(2005+)或连接多个表的视图来加快速度。
#5
13
I don't know MSSQL specifically, but 36 million rows is not large to an enterprise database - working with mainframe databases, 100,000 rows sounds like a configuration table to me :-).
我不知道具体的MSSQL,但是3600万行对于企业数据库来说并不大——使用大型机数据库,100,000行对我来说就像一个配置表:-)。
While I'm not a big fan of some of Microsoft's software, this isn't Access we're talking about here: I assume they can handle pretty substantial database sizes with their enterprise DBMS.
虽然我不太喜欢微软的一些软件,但我们在这里讨论的并不是访问:我认为他们可以用企业DBMS处理相当大的数据库大小。
I suspect days may have been too fine a resolution to divide it up, if indeed it needs dividing at all.
我怀疑,如果确实需要划分的话,用天数来划分可能太好了。
#6
5
We have tables in SQL Server 2005 and 2008 with over 1 Billion rows in it (30 million added daily). I can't imagine going down the rats nest of splitting that out into a new table each day.
我们在SQL Server 2005和2008中有超过10亿行(每天增加3000万)的表。我无法想象每天把它分成一张新桌子的情形。
Much cheaper to add the appropriate disk space (which you need anyway) and RAM.
添加适当的磁盘空间(无论如何都需要)和RAM要便宜得多。
#7
3
It depends, but I would say it is better to keep everything in one table for that sake of simplicity.
这要看情况,但为了简单起见,我认为最好把所有东西都放在一个表中。
100,000 rows a day is not really that much of an enormous amount. (Depending on your server hardware). I have personally seen MSSQL handle up to 100M rows in a single table without any problems. As long as your keep your indexes in order it should be all good. The key is to have heaps of memory so that indexes don't have to be swapped out to disk.
每天100,000行并不是一个很大的量。(取决于您的服务器硬件)。我个人见过MSSQL在一个表中处理多达100行,没有任何问题。只要你保持你的索引有序,它应该是好的。关键是要有大量的内存,这样索引就不必被交换到磁盘上。
On the other hand, it depends on how you are using the data, if you need to make lots of query's, and its unlikely data will be needed that spans multiple days (so you won't need to join the tables) it will be faster to separate out it out into multiple tables. This is often used in applications such as industrial process control where you might be reading the value on say 50,000 instruments every 10 seconds. In this case speed is extremely important, but simplicity is not.
另一方面,这取决于你如何使用数据,如果你需要大量的查询,和它的数据可能需要跨多个天(所以你不需要加入表)它将更快地分离到多个表。这通常用于工业过程控制等应用程序,在这些应用程序中,您可能每10秒读取50,000台设备的值。在这种情况下,速度是极其重要的,但简单不是。
#8
3
We overflowed an integer primary key once (which is ~2.4 billion rows) on a table. If there's a row limit, you're not likely to ever hit it at a mere 36 million rows per year.
在一个表上,我们一次溢出了一个整型主键(大约24亿行)。如果有行限制,你不太可能达到每年仅3600万行。
#9
2
You can populate the table until you have enough disk space. For better performance you can try migration to SQL Server 2005 and then partition the table and put parts on different disks(if you have RAID configuration that could really help you). Partitioning is possible only in enterprise version of SQL Server 2005. You can look partitioning example at this link: http://technet.microsoft.com/en-us/magazine/cc162478.aspx
您可以填充该表,直到拥有足够的磁盘空间。为了获得更好的性能,您可以尝试迁移到SQL Server 2005,然后对表进行分区,并将部分放在不同的磁盘上(如果您有RAID配置,这将真正帮助您)。只有在SQL Server 2005的企业版本中才能实现分区。您可以在此链接中查看分区示例:http://technet.microsoft.com/en-us/magazine/cc162478.aspx。
Also you can try to create views for most used data portion, that is also one of the solutions.
您还可以尝试为大多数使用的数据部分创建视图,这也是解决方案之一。
Hope this helped...
希望这有助于……
#10
0
Largest table I've encountered on SQL Server 8 on Windows2003 was 799 million with 5 columns. But whether or not it's good will is to be measured against the SLA and usage case - e.g. load 50-100,000,000 records and see if it still works.
我在Windows2003的SQL Server 8上遇到的最大的表是7亿9千9百万,有5列。但是,是否要根据SLA和使用情况来衡量它是否良好,例如,加载50-100,000,000个记录,看看它是否仍然有效。
#11
-1
SELECT Top 1 sysobjects.[name], max(sysindexes.[rows]) AS TableRows,
CAST(
CASE max(sysindexes.[rows])
WHEN 0 THEN -0
ELSE LOG10(max(sysindexes.[rows]))
END
AS NUMERIC(5,2))
AS L10_TableRows
FROM sysindexes INNER JOIN sysobjects ON sysindexes.[id] = sysobjects.[id]
WHERE sysobjects.xtype = 'U'
GROUP BY sysobjects.[name]
ORDER BY max(rows) DESC
#12
-3
Partition the table monthly.That is the best way to handle tables with large daily influx ,be it oracle or MSSQL.
分区表每月。这是处理每天大量涌入的表的最佳方式,无论是oracle还是MSSQL。
#1
31
It's hard to give a generic answer to this. It really depends on number of factors:
很难给出一个通用的答案。这取决于因素的数量:
- what size your row is
- 你这行有多大
- what kind of data you store (strings, blobs, numbers)
- 您存储的数据类型(字符串、blob、数字)
- what do you do with your data (just keep it as archive, query it regularly)
- 你如何处理你的数据(只是把它作为存档,定期查询)
- do you have indexes on your table - how many
- 你的表上有索引吗
- what's your server specs
- 你的服务器规格
etc.
等。
As answered elsewhere here, 100,000 a day and thus per table is overkill - I'd suggest monthly or weekly perhaps even quarterly. The more tables you have the bigger maintenance/query nightmare it will become.
就像其他地方回答的那样,每天10万英镑,因此每一张桌子都是多余的——我建议每个月或每周甚至每季度。您拥有的表越多,维护/查询噩梦就越大。
#2
78
These are some of the Maximum Capacity Specifications for SQL Server 2008 R2
这些是SQL Server 2008 R2的一些最大容量规范
- Database size: 524,272 terabytes
- 数据库大小:524272字节
- Databases per instance of SQL Server: 32,767
- 每个SQL Server实例的数据库:32,767
- Filegroups per database: 32,767
- Filegroups每个数据库:32767
- Files per database: 32,767
- 每个数据库文件:32767
- File size (data): 16 terabytes
- 文件大小(数据):16tb。
- File size (log): 2 terabytes
- 文件大小(日志):2 tb
- Rows per table: Limited by available storage
- 每表行:受可用存储限制
- Tables per database: Limited by number of objects in a database
- 每个数据库的表:受数据库中对象数量的限制
#3
27
I have a three column table with just over 6 Billion rows in SQL Server 2008 R2.
我有一个三列表,SQL Server 2008 R2中有60多亿行。
We query it every day to create minute-by-minute system analysis charts for our customers. I have not noticed any database performance hits (though the fact that it grows ~1 GB every day does make managing backups a bit more involved than I would like).
我们每天都对它进行查询,为客户创建每分钟的系统分析图表。我没有注意到任何数据库性能问题(尽管它每天增长1 GB确实使管理备份比我希望的要复杂一些)。
Update July 2016
2016年7月更新
We made it to ~24.5 billion rows before backups became large enough for us to decide to truncate records older than two years (~700 GB stored in multiple backups, including on expensive tapes). It's worth noting that performance was not a significant motivator in this decision (i.e., it was still working great).
在备份变得足够大到可以截断两年以上的记录(在多个备份中存储约700 GB,包括在昂贵的磁带上)之前,我们完成了大约245亿行。值得注意的是,绩效在这个决定中并不是一个重要的激励因素。它仍然运行得很好)。
For anyone who finds themselves trying to delete 20 billion rows from SQL Server, I highly recommend this article. Relevant code in case the link dies (read the article for a full explanation):
对于那些试图从SQL Server中删除200亿行的人,我强烈推荐这篇文章。如果链接失效,相关代码(请阅读本文以获得完整的解释):
ALTER DATABASE DeleteRecord SET RECOVERY SIMPLE;
GO
BEGIN TRY
BEGIN TRANSACTION
-- Bulk logged
SELECT *
INTO dbo.bigtable_intermediate
FROM dbo.bigtable
WHERE Id % 2 = 0;
-- minimal logged because DDL-Operation
TRUNCATE TABLE dbo.bigtable;
-- Bulk logged because target table is exclusivly locked!
SET IDENTITY_INSERT dbo.bigTable ON;
INSERT INTO dbo.bigtable WITH (TABLOCK) (Id, c1, c2, c3)
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;
SET IDENTITY_INSERT dbo.bigtable OFF;
COMMIT
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK
END CATCH
ALTER DATABASE DeleteRecord SET RECOVERY FULL;
GO
Update November 2016
2016年11月更新
If you plan on storing this much data in a single table: don't. I highly recommend you consider table partitioning (either manually or with the built-in features if you're running Enterprise edition). This makes dropping old data as easy as truncating a table once a (week/month/etc.). If you don't have Enterprise (which we don't), you can simply write a script which runs once a month, drops tables older than 2 years, creates next month's table, and regenerates a dynamic view that joins all of the partition tables together for easy querying. Obviously "once a month" and "older than 2 years" should be defined by you based on what makes sense for your use-case. Deleting directly from a table with tens of billions of rows of data will a) take a HUGE amount of time and b) fill up the transaction log hundreds or thousands of times over.
如果您打算将这么多数据存储在一个表中,请不要这样做。我强烈建议您考虑表分区(如果您正在运行企业版,可以手动或使用内置特性)。这使得删除旧数据就像每周/每月删除一次表一样容易。如果您没有Enterprise(我们没有),您可以简单地编写一个每月运行一次的脚本,删除超过2年的表,创建下个月的表,并重新生成一个动态视图,该视图将所有分区表连接在一起,以便查询。显然,“每月一次”和“超过2年”应该由您根据您的用例的意义来定义。直接从表中删除数百亿行的数据将会花费大量的时间和b)将事务日志填满成百上千次。
#4
18
I do not know of a row limit, but I know tables with more than 170 million rows. You may speed it up using partitioned tables (2005+) or views that connect multiple tables.
我不知道行限制,但我知道有超过1.7亿行的表。您可以使用分区表(2005+)或连接多个表的视图来加快速度。
#5
13
I don't know MSSQL specifically, but 36 million rows is not large to an enterprise database - working with mainframe databases, 100,000 rows sounds like a configuration table to me :-).
我不知道具体的MSSQL,但是3600万行对于企业数据库来说并不大——使用大型机数据库,100,000行对我来说就像一个配置表:-)。
While I'm not a big fan of some of Microsoft's software, this isn't Access we're talking about here: I assume they can handle pretty substantial database sizes with their enterprise DBMS.
虽然我不太喜欢微软的一些软件,但我们在这里讨论的并不是访问:我认为他们可以用企业DBMS处理相当大的数据库大小。
I suspect days may have been too fine a resolution to divide it up, if indeed it needs dividing at all.
我怀疑,如果确实需要划分的话,用天数来划分可能太好了。
#6
5
We have tables in SQL Server 2005 and 2008 with over 1 Billion rows in it (30 million added daily). I can't imagine going down the rats nest of splitting that out into a new table each day.
我们在SQL Server 2005和2008中有超过10亿行(每天增加3000万)的表。我无法想象每天把它分成一张新桌子的情形。
Much cheaper to add the appropriate disk space (which you need anyway) and RAM.
添加适当的磁盘空间(无论如何都需要)和RAM要便宜得多。
#7
3
It depends, but I would say it is better to keep everything in one table for that sake of simplicity.
这要看情况,但为了简单起见,我认为最好把所有东西都放在一个表中。
100,000 rows a day is not really that much of an enormous amount. (Depending on your server hardware). I have personally seen MSSQL handle up to 100M rows in a single table without any problems. As long as your keep your indexes in order it should be all good. The key is to have heaps of memory so that indexes don't have to be swapped out to disk.
每天100,000行并不是一个很大的量。(取决于您的服务器硬件)。我个人见过MSSQL在一个表中处理多达100行,没有任何问题。只要你保持你的索引有序,它应该是好的。关键是要有大量的内存,这样索引就不必被交换到磁盘上。
On the other hand, it depends on how you are using the data, if you need to make lots of query's, and its unlikely data will be needed that spans multiple days (so you won't need to join the tables) it will be faster to separate out it out into multiple tables. This is often used in applications such as industrial process control where you might be reading the value on say 50,000 instruments every 10 seconds. In this case speed is extremely important, but simplicity is not.
另一方面,这取决于你如何使用数据,如果你需要大量的查询,和它的数据可能需要跨多个天(所以你不需要加入表)它将更快地分离到多个表。这通常用于工业过程控制等应用程序,在这些应用程序中,您可能每10秒读取50,000台设备的值。在这种情况下,速度是极其重要的,但简单不是。
#8
3
We overflowed an integer primary key once (which is ~2.4 billion rows) on a table. If there's a row limit, you're not likely to ever hit it at a mere 36 million rows per year.
在一个表上,我们一次溢出了一个整型主键(大约24亿行)。如果有行限制,你不太可能达到每年仅3600万行。
#9
2
You can populate the table until you have enough disk space. For better performance you can try migration to SQL Server 2005 and then partition the table and put parts on different disks(if you have RAID configuration that could really help you). Partitioning is possible only in enterprise version of SQL Server 2005. You can look partitioning example at this link: http://technet.microsoft.com/en-us/magazine/cc162478.aspx
您可以填充该表,直到拥有足够的磁盘空间。为了获得更好的性能,您可以尝试迁移到SQL Server 2005,然后对表进行分区,并将部分放在不同的磁盘上(如果您有RAID配置,这将真正帮助您)。只有在SQL Server 2005的企业版本中才能实现分区。您可以在此链接中查看分区示例:http://technet.microsoft.com/en-us/magazine/cc162478.aspx。
Also you can try to create views for most used data portion, that is also one of the solutions.
您还可以尝试为大多数使用的数据部分创建视图,这也是解决方案之一。
Hope this helped...
希望这有助于……
#10
0
Largest table I've encountered on SQL Server 8 on Windows2003 was 799 million with 5 columns. But whether or not it's good will is to be measured against the SLA and usage case - e.g. load 50-100,000,000 records and see if it still works.
我在Windows2003的SQL Server 8上遇到的最大的表是7亿9千9百万,有5列。但是,是否要根据SLA和使用情况来衡量它是否良好,例如,加载50-100,000,000个记录,看看它是否仍然有效。
#11
-1
SELECT Top 1 sysobjects.[name], max(sysindexes.[rows]) AS TableRows,
CAST(
CASE max(sysindexes.[rows])
WHEN 0 THEN -0
ELSE LOG10(max(sysindexes.[rows]))
END
AS NUMERIC(5,2))
AS L10_TableRows
FROM sysindexes INNER JOIN sysobjects ON sysindexes.[id] = sysobjects.[id]
WHERE sysobjects.xtype = 'U'
GROUP BY sysobjects.[name]
ORDER BY max(rows) DESC
#12
-3
Partition the table monthly.That is the best way to handle tables with large daily influx ,be it oracle or MSSQL.
分区表每月。这是处理每天大量涌入的表的最佳方式,无论是oracle还是MSSQL。