My fellow programmer has a strange requirement from his team leader; he insisted on creating varchar
columns with a length of 16*2n.
我的同事程序员对他的团队领导有一个奇怪的要求;他坚持要创建长度为16 * 2n的varchar列。
What is the point of such restriction?
这种限制有什么意义?
I can suppose that short strings (less than 128 chars for example) a stored directly in the record of the table and from this point of view the restriction will help to align fields in the record, larger strings are stored in the database "heap" and only the reference to this string is saved in the table record.
我可以假设短字符串(例如少于128个字符)直接存储在表的记录中,从这个角度来看限制将有助于对齐记录中的字段,较大的字符串存储在数据库“堆”中并且只有对该字符串的引用保存在表记录中。
Is it so?
是这样吗?
Is this requirement has a reasonable background?
这个要求是否有合理的背景?
BTW, the DBMS is SQL Server 2008.
顺便说一句,DBMS是SQL Server 2008。
3 个解决方案
#1
21
Completely pointless restriction as far as I can see. Assuming standard FixedVar
format (as opposed to the formats used with row/page compression or sparse columns) and assuming you are talking about varchar(1-8000)
columns
完全无意义的限制,据我所见。假设标准的FixedVar格式(与行/页面压缩或稀疏列使用的格式相反)并假设您正在讨论varchar(1-8000)列
All varchar
data is stored at the end of the row in a variable length section (or in offrow pages if it can't fit in row). The amount of space it consumes in that section (and whether or not it ends up off row) is entirely dependant upon the length of the actual data not the column declaration.
所有varchar数据都存储在行的末尾,位于可变长度部分(如果它不能适合行,则存储在offrow页面中)。它在该部分中消耗的空间量(以及它是否在行外结束)完全取决于实际数据的长度而不是列声明。
SQL Server will use the length declared in the column declaration when allocating memory (e.g. for sort
operations). The assumption it makes in that instance is that varchar
columns will be filled to 50% of their declared size on average so this might be a better thing to look at when choosing a size.
SQL Server将在分配内存时使用列声明中声明的长度(例如,用于排序操作)。它在该实例中的假设是varchar列将平均填充到其声明大小的50%,因此在选择大小时这可能是更好的选择。
#2
4
You should always store the data in the data size that matches the data being stored. It is part of how the database can maintain integrity. For instance suppose you are storing email addresses. If your data size is the size of the maximum allowable emailaddress, then you will not be able to store bad data that is larger than that. That is a good thing. Some people want to make everything nvarchar(max) or varchar(max). However, this causes only indexing problems.
您应始终将数据存储在与存储的数据匹配的数据大小中。它是数据库如何保持完整性的一部分。例如,假设您正在存储电子邮件地址。如果您的数据大小是允许的最大电子邮件地址的大小,那么您将无法存储大于该数据的错误数据。这是一件好事。有些人想要制作nvarchar(max)或varchar(max)。但是,这只会导致索引问题。
Personally I would have gone back to the person who make this requirement and asked for a reason. Then I would have presented my reasons as to why it might not be a good idea. I woul never just blindly implement something like this. In pushing back on a requirement like this, I would first do some research into how SQL Server organizes data on the disk, so I could show the impact of the requirement is likely to have on performance. I might even be surprised to find out the requirement made sense, but I doubt it at this point.
就个人而言,我会回到提出此要求的人并询问原因。然后我会提出我的原因,为什么它可能不是一个好主意。我永远不会盲目地实施这样的事情。在推迟这样的要求时,我首先会研究SQL Server如何在磁盘上组织数据,因此我可以展示需求可能对性能产生的影响。我甚至可能会惊讶地发现这个要求是有道理的,但我现在对此表示怀疑。
#3
4
I have heard of this practice before, but after researching this question a bit I don't think there is a practical reason for having varchar values in multiples of 16. I think this requirement probably comes from trying to optimize the space used on each page. In SQL Server, pages are set at 8 KB per page. Rows are stored in pages, so perhaps the thinking is that you could conserve space on the pages if the size of each row divided evenly into 8 KB (a more detailed description of how SQL Server stores data can be found here). However, since the amount of space used by a varchar field is determined by its actual content, I don't see how using lengths in multiples of 16 or any other scheme could help you optimize the amount of space used by each row on the page. The length of the varchar fields should just be set to whatever the business requirements dictate.
我之前听说过这种做法,但在研究了这个问题后,我认为将varchar值设置为16的倍数并不存在实际原因。我认为这个要求可能来自于尝试优化每页上使用的空间。在SQL Server中,页面设置为每页8 KB。行存储在页面中,因此可能的想法是,如果每行的大小均匀分配到8 KB,则可以节省页面上的空间(可以在此处找到有关SQL Server如何存储数据的更详细说明)。但是,由于varchar字段使用的空间量由其实际内容决定,我不知道如何使用16的倍数或任何其他方案的长度可以帮助您优化页面上每行使用的空间量。 varchar字段的长度应该设置为业务要求所指示的任何值。
Additionally, this question covers similar ground and the conclusion also seems to be the same:
Database column sizes for character based data
此外,这个问题涵盖了类似的基础,结论似乎也是相同的:基于字符的数据的数据库列大小
#1
21
Completely pointless restriction as far as I can see. Assuming standard FixedVar
format (as opposed to the formats used with row/page compression or sparse columns) and assuming you are talking about varchar(1-8000)
columns
完全无意义的限制,据我所见。假设标准的FixedVar格式(与行/页面压缩或稀疏列使用的格式相反)并假设您正在讨论varchar(1-8000)列
All varchar
data is stored at the end of the row in a variable length section (or in offrow pages if it can't fit in row). The amount of space it consumes in that section (and whether or not it ends up off row) is entirely dependant upon the length of the actual data not the column declaration.
所有varchar数据都存储在行的末尾,位于可变长度部分(如果它不能适合行,则存储在offrow页面中)。它在该部分中消耗的空间量(以及它是否在行外结束)完全取决于实际数据的长度而不是列声明。
SQL Server will use the length declared in the column declaration when allocating memory (e.g. for sort
operations). The assumption it makes in that instance is that varchar
columns will be filled to 50% of their declared size on average so this might be a better thing to look at when choosing a size.
SQL Server将在分配内存时使用列声明中声明的长度(例如,用于排序操作)。它在该实例中的假设是varchar列将平均填充到其声明大小的50%,因此在选择大小时这可能是更好的选择。
#2
4
You should always store the data in the data size that matches the data being stored. It is part of how the database can maintain integrity. For instance suppose you are storing email addresses. If your data size is the size of the maximum allowable emailaddress, then you will not be able to store bad data that is larger than that. That is a good thing. Some people want to make everything nvarchar(max) or varchar(max). However, this causes only indexing problems.
您应始终将数据存储在与存储的数据匹配的数据大小中。它是数据库如何保持完整性的一部分。例如,假设您正在存储电子邮件地址。如果您的数据大小是允许的最大电子邮件地址的大小,那么您将无法存储大于该数据的错误数据。这是一件好事。有些人想要制作nvarchar(max)或varchar(max)。但是,这只会导致索引问题。
Personally I would have gone back to the person who make this requirement and asked for a reason. Then I would have presented my reasons as to why it might not be a good idea. I woul never just blindly implement something like this. In pushing back on a requirement like this, I would first do some research into how SQL Server organizes data on the disk, so I could show the impact of the requirement is likely to have on performance. I might even be surprised to find out the requirement made sense, but I doubt it at this point.
就个人而言,我会回到提出此要求的人并询问原因。然后我会提出我的原因,为什么它可能不是一个好主意。我永远不会盲目地实施这样的事情。在推迟这样的要求时,我首先会研究SQL Server如何在磁盘上组织数据,因此我可以展示需求可能对性能产生的影响。我甚至可能会惊讶地发现这个要求是有道理的,但我现在对此表示怀疑。
#3
4
I have heard of this practice before, but after researching this question a bit I don't think there is a practical reason for having varchar values in multiples of 16. I think this requirement probably comes from trying to optimize the space used on each page. In SQL Server, pages are set at 8 KB per page. Rows are stored in pages, so perhaps the thinking is that you could conserve space on the pages if the size of each row divided evenly into 8 KB (a more detailed description of how SQL Server stores data can be found here). However, since the amount of space used by a varchar field is determined by its actual content, I don't see how using lengths in multiples of 16 or any other scheme could help you optimize the amount of space used by each row on the page. The length of the varchar fields should just be set to whatever the business requirements dictate.
我之前听说过这种做法,但在研究了这个问题后,我认为将varchar值设置为16的倍数并不存在实际原因。我认为这个要求可能来自于尝试优化每页上使用的空间。在SQL Server中,页面设置为每页8 KB。行存储在页面中,因此可能的想法是,如果每行的大小均匀分配到8 KB,则可以节省页面上的空间(可以在此处找到有关SQL Server如何存储数据的更详细说明)。但是,由于varchar字段使用的空间量由其实际内容决定,我不知道如何使用16的倍数或任何其他方案的长度可以帮助您优化页面上每行使用的空间量。 varchar字段的长度应该设置为业务要求所指示的任何值。
Additionally, this question covers similar ground and the conclusion also seems to be the same:
Database column sizes for character based data
此外,这个问题涵盖了类似的基础,结论似乎也是相同的:基于字符的数据的数据库列大小