你用于外键字段的大小是多少?

时间:2021-12-06 07:31:38

I have a star schema type data base, with fact tables that have many foreign keys to dimension tables. The number of records in each dimension table is small - often less than 256 bytes, but always less than 64k. The fact tables typically have hundreds of thousands of records, so I want maximize join speed.

我有一个星型模式类型数据库,事实表有许多维度表的外键。每个维度表中的记录数量很少 - 通常少于256个字节,但总是小于64k。事实表通常有数十万条记录,因此我希望最大化连接速度。

I'd like to use tinyints and smallints, but a coworker says I'm crazy to worry about this and just use 4 byte ints in every case. Who is right?

我想使用tinyints和tinyints,但是一位同事说我很担心这一点,并且在每种情况下只使用4字节的整数。谁是对的?

5 个解决方案

#1


2  

Yr co-worker is wrong. If you use four byte integers for the foreign Keys, then the primary keys in the fact table have to be 4-byte integers as well. And then you are making your fact table wider than it needs to be, reducing the number of records that can fit on a single index page. To the degree that this changes the width of the primary Key Index, this will adversely affect index performance. If your Primary key could have been two tinyInts and 3 smallints, and you change to five 4-byte ints, you have changed the width of the index from 8 bytes wide to 20 bytes wide. Your index will have less than half as many entries per I/O page, and it will require twice as many logical and/or physical reads to traverse.

你的同事是错的。如果对外键使用四字节整数,则事实表中的主键也必须是4字节整数。然后,您正在使您的事实表变得比它需要的更宽,减少了可以放在单个索引页面上的记录数。如果这会改变主键索引的宽度,则会对索引性能产生负面影响。如果您的主键可能是两个tinyInts和3个smallint,并且您更改为五个4字节整数,则已将索引的宽度从8字节宽更改为20字节宽。您的索引每个I / O页面的条目少于一半,并且需要两倍的逻辑和/或物理读取才能遍历。

NOTE: As Jim McLeod's answer below, SQL Server 2008, (Enterprise or Developer edition), includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

注意:正如Jim McLeod在下面的回答,SQL Server 2008(企业版或开发人员版)包括行级压缩,这意味着您可以将值声明为4字节的INT,但它会将值存储在最合适的大小每行输入一次。

#2


4  

Go with the 4 Byte Ints and do your optimisation elsewhere. Any effort you spend here won't gain you enough of a return compared with ease of coding, ease of use and ease of maintenance that a simple schema offers.

使用4 Byte Ints并在其他地方进行优化。与简单的模式提供的编码简易性,易用性和易维护性相比,您在这里花费的任何努力都不会获得足够的回报。

#3


2  

On a 32 bit server, smaller ints aren't going to save you anything in CPU performance, even less on a 64 bit server. Maybe you'll get a disk saving and therefore some disk improvement, but overall the total improvement may be negligble.

在32位服务器上,较小的int不会为您节省CPU性能,甚至更少在64位服务器上。也许你会节省磁盘,因此会有一些磁盘改进,但总的来说总体改进可能是可以忽略不计的。

#4


0  

As always with performance questions, it depends. If your fact rows are tiny, say 20 bytes each, then a saving of two bytes per row will save 400 bytes, and allow you to fit an extra 20 rows on each page. If your fact rows are larger, say 500 bytes, then you'll only be able to save 32 bytes, which won't matter at all.

与性能问题一样,这取决于。如果您的事实行很小,比如每个20字节,那么每行节省两个字节将节省400个字节,并允许您在每个页面上额外容纳20行。如果您的事实行更大,比如500字节,那么您将只能保存32个字节,这根本不重要。

The benefit of using an INT over a SMALLINT is that you don't have to worry about what happens if you suddenly get more rows than you expected.

使用INT而非SMALLINT的好处是,如果您突然获得的行数超出预期,则无需担心会发生什么。

SQL Server 2008 includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

SQL Server 2008包括行级压缩,这意味着您可以将值声明为4字节的INT,但它会将值存储为每行的最合适大小的类型。

#5


0  

4-byte Integers for primary keys are for most solutions fine.

主键的4字节整数适用于大多数解决方案。

If you want some flexibility in where you can create your PK value and do some data replication later on, you might want to think about using uniqueidentifiers. A Guid is easily created at the database, within a stored procedure, within a DAL layer or anywhere else and is guaranteed to be unique.

如果您希望在可以创建PK值的位置具有一定的灵活性并在以后执行某些数据复制,则可能需要考虑使用uniqueidentifier。 Guid可以在数据库,存储过程,DAL层或其他任何地方轻松创建,并保证是唯一的。

Sometimes only that can give your solution some additional performance by not having to do a database lookup to get a new record ID. (i.e. create it in a DAL layer and store it right away instead of having to use something like scope_identity() or @@Identity)

有时只有通过不必进行数据库查找来获取新记录ID,才能为解决方案提供额外的性能。 (即在DAL层创建并立即存储它,而不必使用scope_identity()或@@ Identity)

Hope this helps.

希望这可以帮助。

#1


2  

Yr co-worker is wrong. If you use four byte integers for the foreign Keys, then the primary keys in the fact table have to be 4-byte integers as well. And then you are making your fact table wider than it needs to be, reducing the number of records that can fit on a single index page. To the degree that this changes the width of the primary Key Index, this will adversely affect index performance. If your Primary key could have been two tinyInts and 3 smallints, and you change to five 4-byte ints, you have changed the width of the index from 8 bytes wide to 20 bytes wide. Your index will have less than half as many entries per I/O page, and it will require twice as many logical and/or physical reads to traverse.

你的同事是错的。如果对外键使用四字节整数,则事实表中的主键也必须是4字节整数。然后,您正在使您的事实表变得比它需要的更宽,减少了可以放在单个索引页面上的记录数。如果这会改变主键索引的宽度,则会对索引性能产生负面影响。如果您的主键可能是两个tinyInts和3个smallint,并且您更改为五个4字节整数,则已将索引的宽度从8字节宽更改为20字节宽。您的索引每个I / O页面的条目少于一半,并且需要两倍的逻辑和/或物理读取才能遍历。

NOTE: As Jim McLeod's answer below, SQL Server 2008, (Enterprise or Developer edition), includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

注意:正如Jim McLeod在下面的回答,SQL Server 2008(企业版或开发人员版)包括行级压缩,这意味着您可以将值声明为4字节的INT,但它会将值存储在最合适的大小每行输入一次。

#2


4  

Go with the 4 Byte Ints and do your optimisation elsewhere. Any effort you spend here won't gain you enough of a return compared with ease of coding, ease of use and ease of maintenance that a simple schema offers.

使用4 Byte Ints并在其他地方进行优化。与简单的模式提供的编码简易性,易用性和易维护性相比,您在这里花费的任何努力都不会获得足够的回报。

#3


2  

On a 32 bit server, smaller ints aren't going to save you anything in CPU performance, even less on a 64 bit server. Maybe you'll get a disk saving and therefore some disk improvement, but overall the total improvement may be negligble.

在32位服务器上,较小的int不会为您节省CPU性能,甚至更少在64位服务器上。也许你会节省磁盘,因此会有一些磁盘改进,但总的来说总体改进可能是可以忽略不计的。

#4


0  

As always with performance questions, it depends. If your fact rows are tiny, say 20 bytes each, then a saving of two bytes per row will save 400 bytes, and allow you to fit an extra 20 rows on each page. If your fact rows are larger, say 500 bytes, then you'll only be able to save 32 bytes, which won't matter at all.

与性能问题一样,这取决于。如果您的事实行很小,比如每个20字节,那么每行节省两个字节将节省400个字节,并允许您在每个页面上额外容纳20行。如果您的事实行更大,比如500字节,那么您将只能保存32个字节,这根本不重要。

The benefit of using an INT over a SMALLINT is that you don't have to worry about what happens if you suddenly get more rows than you expected.

使用INT而非SMALLINT的好处是,如果您突然获得的行数超出预期,则无需担心会发生什么。

SQL Server 2008 includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

SQL Server 2008包括行级压缩,这意味着您可以将值声明为4字节的INT,但它会将值存储为每行的最合适大小的类型。

#5


0  

4-byte Integers for primary keys are for most solutions fine.

主键的4字节整数适用于大多数解决方案。

If you want some flexibility in where you can create your PK value and do some data replication later on, you might want to think about using uniqueidentifiers. A Guid is easily created at the database, within a stored procedure, within a DAL layer or anywhere else and is guaranteed to be unique.

如果您希望在可以创建PK值的位置具有一定的灵活性并在以后执行某些数据复制,则可能需要考虑使用uniqueidentifier。 Guid可以在数据库,存储过程,DAL层或其他任何地方轻松创建,并保证是唯一的。

Sometimes only that can give your solution some additional performance by not having to do a database lookup to get a new record ID. (i.e. create it in a DAL layer and store it right away instead of having to use something like scope_identity() or @@Identity)

有时只有通过不必进行数据库查找来获取新记录ID,才能为解决方案提供额外的性能。 (即在DAL层创建并立即存储它,而不必使用scope_identity()或@@ Identity)

Hope this helps.

希望这可以帮助。